1
|
Conev A, Devaurs D, Rigo MM, Antunes DA, Kavraki LE. 3pHLA-score improves structure-based peptide-HLA binding affinity prediction. Sci Rep 2022; 12:10749. [PMID: 35750701 PMCID: PMC9232595 DOI: 10.1038/s41598-022-14526-x] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2022] [Accepted: 06/08/2022] [Indexed: 12/30/2022] Open
Abstract
Binding of peptides to Human Leukocyte Antigen (HLA) receptors is a prerequisite for triggering immune response. Estimating peptide-HLA (pHLA) binding is crucial for peptide vaccine target identification and epitope discovery pipelines. Computational methods for binding affinity prediction can accelerate these pipelines. Currently, most of those computational methods rely exclusively on sequence-based data, which leads to inherent limitations. Recent studies have shown that structure-based data can address some of these limitations. In this work we propose a novel machine learning (ML) structure-based protocol to predict binding affinity of peptides to HLA receptors. For that, we engineer the input features for ML models by decoupling energy contributions at different residue positions in peptides, which leads to our novel per-peptide-position protocol. Using Rosetta's ref2015 scoring function as a baseline we use this protocol to develop 3pHLA-score. Our per-peptide-position protocol outperforms the standard training protocol and leads to an increase from 0.82 to 0.99 of the area under the precision-recall curve. 3pHLA-score outperforms widely used scoring functions (AutoDock4, Vina, Dope, Vinardo, FoldX, GradDock) in a structural virtual screening task. Overall, this work brings structure-based methods one step closer to epitope discovery pipelines and could help advance the development of cancer and viral vaccines.
Collapse
Affiliation(s)
- Anja Conev
- grid.21940.3e0000 0004 1936 8278Department of Computer Science, Rice University, Houston, 77005 USA
| | - Didier Devaurs
- grid.4305.20000 0004 1936 7988MRC Institute of Genetics and Cancer, University of Edinburgh, Edinburgh, EH4 2XU UK
| | - Mauricio Menegatti Rigo
- grid.21940.3e0000 0004 1936 8278Department of Computer Science, Rice University, Houston, 77005 USA
| | | | - Lydia E. Kavraki
- grid.21940.3e0000 0004 1936 8278Department of Computer Science, Rice University, Houston, 77005 USA
| |
Collapse
|
2
|
Nordin J, Pettersson M, Rosenberg LH, Mathioudaki A, Karlsson Å, Murén E, Tandre K, Rönnblom L, Kastbom A, Cedergren J, Eriksson P, Söderkvist P, Lindblad-Toh K, Meadows JRS. Association of Protective HLA-A With HLA-B∗27 Positive Ankylosing Spondylitis. Front Genet 2021; 12:659042. [PMID: 34335681 PMCID: PMC8320510 DOI: 10.3389/fgene.2021.659042] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/26/2021] [Accepted: 06/09/2021] [Indexed: 11/21/2022] Open
Abstract
Objectives To further elucidate the role of the MHC in ankylosing spondylitis by typing 17 genes, searching for HLA-B∗27 independent associations and assessing the impact of sex on this male biased disease. Methods High-confidence two-field resolution genotyping was performed on 310 cases and 2196 controls using an n-1 concordance method. Protein-coding variants were called from next-generation sequencing reads using up to four software programs and the consensus result recorded. Logistic regression tests were applied to the dataset as a whole, and also in stratified sets based on sex or HLA-B∗27 status. The amino acids driving association were also examined. Results Twenty-five HLA protein-coding variants were significantly associated to disease in the population. Three novel protective associations were found in a HLA-B∗27 positive population, HLA-A∗24:02 (OR = 0.4, CI = 0.2–0.7), and HLA-A amino acids Leu95 and Gln156. We identified a key set of seven loci that were common to both sexes, and robust to change in sample size. Stratifying by sex uncovered three novel risk variants restricted to the female population (HLA-DQA1∗04.01, -DQB1∗04:02, -DRB1∗08:01; OR = 2.4–3.1). We also uncovered a set of neutral variants in the female population, which in turn conferred strong effects in the male set, highlighting how population composition can lead to the masking of true associations. Conclusion Population stratification allowed for a nuanced investigation into the tightly linked MHC region, revealing novel HLA-B∗27 signals as well as replicating previous HLA-B∗27 dependent results. This dissection of signals may help to elucidate sex biased disease predisposition and clinical progression.
Collapse
Affiliation(s)
- Jessika Nordin
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Science for Life Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden
| | - Mats Pettersson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Lina Hultin Rosenberg
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Argyri Mathioudaki
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Åsa Karlsson
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Eva Murén
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| | - Karolina Tandre
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Lars Rönnblom
- Science for Life Laboratory, Department of Medical Sciences, Uppsala University, Uppsala, Sweden
| | - Alf Kastbom
- Department of Rheumatology, University Hospital Linköping, Linköping, Sweden.,Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Jan Cedergren
- Department of Rheumatology, University Hospital Linköping, Linköping, Sweden.,Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Per Eriksson
- Department of Rheumatology, University Hospital Linköping, Linköping, Sweden.,Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Peter Söderkvist
- Department of Biomedical and Clinical Sciences, Linköping University, Linköping, Sweden
| | - Kerstin Lindblad-Toh
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.,Broad Institute of MIT and Harvard, Cambridge, MA, United States
| | - Jennifer R S Meadows
- Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden
| |
Collapse
|
3
|
Zhou P, Liu Q, Wu T, Miao Q, Shang S, Wang H, Chen Z, Wang S, Wang H. Systematic Comparison and Comprehensive Evaluation of 80 Amino Acid Descriptors in Peptide QSAR Modeling. J Chem Inf Model 2021; 61:1718-1731. [DOI: 10.1021/acs.jcim.0c01370] [Citation(s) in RCA: 44] [Impact Index Per Article: 14.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Affiliation(s)
- Peng Zhou
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qian Liu
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Ting Wu
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Qingqing Miao
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shuyong Shang
- College of Chemistry and Life Science, Chengdu Normal University, Chengdu 611130, China
| | - Heyi Wang
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Zheng Chen
- Center for Informational Biology, University of Electronic Science and Technology of China (UESTC) at Qingshuihe Campus, Chengdu 611731, China
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Shaozhou Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| | - Heyan Wang
- School of Life Science and Technology, University of Electronic Science and Technology of China (UESTC) at Shahe Campus, Chengdu 610054, China
| |
Collapse
|
4
|
Mahmoodi-Reihani M, Abbasitabar F, Zare-Shahabadi V. In Silico Rational Design and Virtual Screening of Bioactive Peptides Based on QSAR Modeling. ACS OMEGA 2020; 5:5951-5958. [PMID: 32226875 PMCID: PMC7097998 DOI: 10.1021/acsomega.9b04302] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Accepted: 02/27/2020] [Indexed: 05/15/2023]
Abstract
Predicting the bioactivity of peptides is an important challenge in drug development and peptide research. In this study, numerical descriptive vectors (NDVs) for peptide sequences were calculated based on the physicochemical properties of amino acids (AAs) and principal component analysis (PCA). The resulted NDV had the same length as the peptide sequence, so that each entry of NDV corresponded to one AA in the sequence. They were then applied to quantitative structure-activity relationship (QSAR) analysis of angiotensin-converting enzyme (ACE) inhibitor dipeptides, bitter-tasting dipeptides, and nonameric binding peptides of the human leukocyte antigens (HLA-A*0201). Multiple linear regression was used to construct the QSAR models. For each peptide set, a proper subset of physicochemical properties was chosen by the ant colony optimization algorithm. The leave-one-out cross-validation (q loo 2) values were 0.855, 0.936, and 0.642 and the root-mean-square errors (RMSEs) were 0.450, 0.149, and 0.461. Our results revealed that the new numerical descriptive vector can afford extensive characterization of peptide sequence so that it can be easily employed in peptide QSAR studies. Moreover, the proposed numerical descriptive vectors were able to determine hot spot residues in the peptides under study.
Collapse
Affiliation(s)
| | - Fatemeh Abbasitabar
- Department
of Chemistry, Marvdasht Branch, Islamic
Azad University, Marvdasht, Iran
| | - Vahid Zare-Shahabadi
- Department
of Chemistry, Mahshahr Branch, Islamic Azad
University, Mahshahr Iran
| |
Collapse
|
5
|
Zhang R, Wen LY, Wu WS, Yuan XZ, Zhang LJ. Quantitative Structure-Property Relationship for pH-Triggered Drug Release Performance of Acid-Responsive Four/Six-Arms Star Polymeric Micelles. Pharm Res 2018; 36:20. [PMID: 30511187 DOI: 10.1007/s11095-018-2549-4] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2018] [Accepted: 11/25/2018] [Indexed: 11/29/2022]
Abstract
PURPOSE The pH-responsive copolymer micelles are widely used as carriers in drug delivery system, but there are few micro-level mechanistically explorations on the pH-triggered drug release. Here we elucidate the relationship between drug release behavior of four/six-arms star copolymer micelles and the copolymer structures. METHOD The net cumulative drug release percentage (En) was taken as the dependent variables, block unit autocorrelation descriptors as independent variables. The quantitative structure-property relationship models of drug release from block copolymers were developed at pH 7.4 and 5.0 of two periods (stage I: 0~12 h, stage II: 12~96 h). RESULTS The models built are of good fitting ability, internal predictive ability, stability and statistically significance. Drug diffusion is mainly influenced by the intra-block force, and micellar erosion by inter-block force. At pH 5.0, lowest unoccupied molecular orbital energy of copolymer unit is the main factor influencing the En. Stage I of drug release is affected by hydrophobic property and stage II by regional polar of copolymer molecules. CONCLUSION The models present good performance, factors affecting drug release behavior at different pH conditions can offer guidance for the design of copolymer structures to control the drug release behavior of micelles in a targeted and quantitatively way.
Collapse
Affiliation(s)
- Ran Zhang
- School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou, 510000, People's Republic of China
| | - Li-Yang Wen
- School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou, 510000, People's Republic of China
| | - Wen-Sheng Wu
- School of Chemistry and Chemical Engineering, Zhaoqing University, Zhaoqing, 526000, People's Republic of China
| | - Xiao-Zhe Yuan
- School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou, 510000, People's Republic of China
| | - Li-Juan Zhang
- School of Chemistry and Chemical Engineering, South China University of Technology, Guangzhou, 510000, People's Republic of China.
| |
Collapse
|
6
|
Jandrlić DR. SVM and SVR-based MHC-binding prediction using a mathematical presentation of peptide sequences. Comput Biol Chem 2016; 65:117-127. [PMID: 27816828 DOI: 10.1016/j.compbiolchem.2016.10.011] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 09/16/2016] [Accepted: 10/24/2016] [Indexed: 11/16/2022]
Abstract
At present, there are a number of methods for the prediction of T-cell epitopes and major histocompatibility complex (MHC)-binding peptides. Despite numerous methods for predicting T-cell epitopes, there still exist limitations that affect the reliability of prevailing methods. For this reason, the development of models with high accuracy are crucial. An accurate prediction of the peptides that bind to specific major histocompatibility complex class I and II (MHC-I and MHC-II) molecules is important for an understanding of the functioning of the immune system and the development of peptide-based vaccines. Peptide binding is the most selective step in identifying T-cell epitopes. In this paper, we present a new approach to predicting MHC-binding ligands that takes into account new weighting schemes for position-based amino acid frequencies, BLOSUM and VOGG substitution of amino acids, and the physicochemical and molecular properties of amino acids. We have made models for quantitatively and qualitatively predicting MHC-binding ligands. Our models are based on two machine learning methods support vector machine (SVM) and support vector regression (SVR), where our models have used for feature selection, several different encoding and weighting schemes for peptides. The resulting models showed comparable, and in some cases better, performance than the best existing predictors. The obtained results indicate that the physicochemical and molecular properties of amino acids (AA) contribute significantly to the peptide-binding affinity.
Collapse
Affiliation(s)
- Davorka R Jandrlić
- University of Belgrade, Faculty of Mechanical Engineering, Kraljice Marije 16, Belgrade, Serbia.
| |
Collapse
|
7
|
Aberrant Glycosylation of Anchor-Optimized MUC1 Peptides Can Enhance Antigen Binding Affinity and Reverse Tolerance to Cytotoxic T Lymphocytes. Biomolecules 2016; 6:biom6030031. [PMID: 27367740 PMCID: PMC5039417 DOI: 10.3390/biom6030031] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2016] [Revised: 06/07/2016] [Accepted: 06/07/2016] [Indexed: 12/22/2022] Open
Abstract
Cancer vaccines have often failed to live up to their promise, although recent results with checkpoint inhibitors are reviving hopes that they will soon fulfill their promise. Although mutation-specific vaccines are under development, there is still high interest in an off-the-shelf vaccine to a ubiquitous antigen, such as MUC1, which is aberrantly expressed on most solid and many hematological tumors, including more than 90% of breast carcinomas. Clinical trials for MUC1 have shown variable success, likely because of immunological tolerance to a self-antigen and to poor immunogenicity of tandem repeat peptides. We hypothesized that MUC1 peptides could be optimized, relying on heteroclitic optimizations of potential anchor amino acids with and without tumor-specific glycosylation of the peptides. We have identified novel MUC1 class I peptides that bind to HLA-A*0201 molecules with significantly higher affinity and function than the native MUC1 peptides. These peptides elicited CTLs from normal donors, as well as breast cancer patients, which were highly effective in killing MUC1-expressing MCF-7 breast cancer cells. Each peptide elicited lytic responses in greater than 6/8 of normal individuals and 3/3 breast cancer patients. The CTLs generated against the glycosylated-anchor modified peptides cross reacted with the native MUC1 peptide, STAPPVHNV, suggesting these analog peptides may offer substantial improvement in the design of epitope-based vaccines.
Collapse
|
8
|
Li B, Zheng X, Hu C, Cao Y. Human Papillomavirus Genome-Wide Identification of T-Cell Epitopes for Peptide Vaccine Development Against Cervical Cancer: An Integration of Computational Analysis and Experimental Assay. J Comput Biol 2015; 22:962-74. [PMID: 26418056 DOI: 10.1089/cmb.2014.0287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Affiliation(s)
- Bo Li
- Department of Obstetrics and Gynecology, Anhui Medical University, Hefei, China
| | - Xianfang Zheng
- Department of Obstetrics and Gynecology, Chaohu Hospital of Anhui Medical University, Chaohu, China
| | - Chuancui Hu
- Department of Obstetrics and Gynecology, Chaohu Hospital of Anhui Medical University, Chaohu, China
| | - Yunxia Cao
- Department of Obstetrics and Gynecology, Anhui Medical University, Hefei, China
| |
Collapse
|
9
|
Genome-wide Screening of Human Papillomavirus-Specific CTL Epitopes Presented by HLA-A Alleles in Cervical Cancer. Int J Pept Res Ther 2015. [DOI: 10.1007/s10989-015-9480-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
|
10
|
Abstract
T-cell epitopes form the basis of many vaccines, diagnostics, and reagents. Current methods for the in silico identification of T-cell epitopes rely, in the main, on the accurate quantitative prediction of peptide-Major Histocompatibility Complex (pMHC) affinity using data-driven computational approaches. Here, we describe a dataset of experimentally determined pMHC binding affinities for the problematic human class I allele HLA-B*2705. Using an in-house, FACS-based, MHC stabilization assay, we measured binding of 223 peptides. This dataset includes both nonbinding and binding peptides, with measured affinities (expressed as −log10 of the half-maximal binding level) ranging from 1.2 to 7.4. This dataset should provide a useful independent benchmark for new and existing methods for predicting peptide binding to HLA-B*2705.
Collapse
|
11
|
Identification of epitopes in Leptospira borgpetersenii leucine-rich repeat proteins. INFECTION GENETICS AND EVOLUTION 2013. [DOI: 10.1016/j.meegid.2012.10.014] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
12
|
Peptide binding specificities of HLA-B*5701 and B*5801. SCIENCE CHINA-LIFE SCIENCES 2012; 55:818-25. [DOI: 10.1007/s11427-012-4374-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/29/2011] [Accepted: 07/11/2012] [Indexed: 01/19/2023]
|
13
|
Hosseinzadeh F, Ebrahimi M, Goliaei B, Shamabadi N. Classification of lung cancer tumors based on structural and physicochemical properties of proteins by bioinformatics models. PLoS One 2012; 7:e40017. [PMID: 22829872 PMCID: PMC3400626 DOI: 10.1371/journal.pone.0040017] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2012] [Accepted: 05/30/2012] [Indexed: 12/03/2022] Open
Abstract
Rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important in diagnosis of this disease. Furthermore sequence-derived structural and physicochemical descriptors are very useful for machine learning prediction of protein structural and functional classes, classifying proteins and the prediction performance. Herein, in this study is the classification of lung tumors based on 1497 attributes derived from structural and physicochemical properties of protein sequences (based on genes defined by microarray analysis) investigated through a combination of attribute weighting, supervised and unsupervised clustering algorithms. Eighty percent of the weighting methods selected features such as autocorrelation, dipeptide composition and distribution of hydrophobicity as the most important protein attributes in classification of SCLC, NSCLC and COMMON classes of lung tumors. The same results were observed by most tree induction algorithms while descriptors of hydrophobicity distribution were high in protein sequences COMMON in both groups and distribution of charge in these proteins was very low; showing COMMON proteins were very hydrophobic. Furthermore, compositions of polar dipeptide in SCLC proteins were higher than NSCLC proteins. Some clustering models (alone or in combination with attribute weighting algorithms) were able to nearly classify SCLC and NSCLC proteins. Random Forest tree induction algorithm, calculated on leaves one-out and 10-fold cross validation) shows more than 86% accuracy in clustering and predicting three different lung cancer tumors. Here for the first time the application of data mining tools to effectively classify three classes of lung cancer tumors regarding the importance of dipeptide composition, autocorrelation and distribution descriptor has been reported.
Collapse
Affiliation(s)
- Faezeh Hosseinzadeh
- Student at Laboratory of Biophysics and Molecular Biology, Institute of Biophysics and Biochemistry, University of Tehran, Tehran, Iran
| | - Mansour Ebrahimi
- Department of Biology at Basic science School & Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| | - Bahram Goliaei
- Department of Medical Physics, Iran University of Medical Science, Tehran, Iran
| | - Narges Shamabadi
- Bioinformatics Research Group, Green Research Center, University of Qom, Qom, Iran
| |
Collapse
|
14
|
He J, Yang G, Rao H, Li Z, Ding X, Chen Y. Prediction of human major histocompatibility complex class II binding peptides by continuous kernel discrimination method. Artif Intell Med 2011; 55:107-15. [PMID: 22134095 DOI: 10.1016/j.artmed.2011.10.005] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2011] [Revised: 10/12/2011] [Accepted: 10/21/2011] [Indexed: 11/25/2022]
Abstract
OBJECTIVE Accurate prediction of major histocompatibility complex (MHC) class II binding peptides helps reducing the experimental cost for identifying helper T cell epitopes, which has been a challenging problem partly because of the variable length of the binding peptides. This work is to develop an accurate model for predicting MHC-binding peptides using machine learning methods. METHODS In this work, a machine learning method, continuous kernel discrimination (CKD), was used for predicting MHC class II binders of variable lengths. The composition transition and distribution features were used for encoding peptide sequence and the Metropolis Monte Carlo simulated annealing approach was used for feature selection. RESULTS Feature selection was found to significantly improve the performance of the model. For benchmark dataset Dataset-1, the number of features is reduced from 147 to 24 and the area under the receiver operating characteristic curve (AUC) is improved from 0.8088 to 0.9034, while for benchmark dataset Dataset-2, the number of features is reduced from 147 to 44 and the AUC is improved from 0.7349 to 0.8499. An optimal CKD model was derived from the feature selection and bandwidth optimization using 10-fold cross-validation. Its AUC values are between 0.831 and 0.980 evaluated on benchmark datasets BM-Set1 and are between 0.806 and 0.949 on benchmark datasets BM-Set2 for MHC class II alleles. These results indicate a significantly better performance for our CKD model over other earlier models based on the training and testing of the same datasets. CONCLUSIONS Our study suggested that the CKD method outperforms other machine learning methods proposed earlier in the prediction of MHC class II biding peptides. Moreover, the choice of the cut-off for CKD classifier is crucial for its performance.
Collapse
Affiliation(s)
- Ju He
- College of Chemistry, Sichuan University, Chengdu 610064, People's Republic of China
| | | | | | | | | | | |
Collapse
|
15
|
Tung CW, Ziehm M, Kämper A, Kohlbacher O, Ho SY. POPISK: T-cell reactivity prediction using support vector machines and string kernels. BMC Bioinformatics 2011; 12:446. [PMID: 22085524 PMCID: PMC3228774 DOI: 10.1186/1471-2105-12-446] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2011] [Accepted: 11/15/2011] [Indexed: 02/03/2023] Open
Abstract
Background Accurate prediction of peptide immunogenicity and characterization of relation between peptide sequences and peptide immunogenicity will be greatly helpful for vaccine designs and understanding of the immune system. In contrast to the prediction of antigen processing and presentation pathway, the prediction of subsequent T-cell reactivity is a much harder topic. Previous studies of identifying T-cell receptor (TCR) recognition positions were based on small-scale analyses using only a few peptides and concluded different recognition positions such as positions 4, 6 and 8 of peptides with length 9. Large-scale analyses are necessary to better characterize the effect of peptide sequence variations on T-cell reactivity and design predictors of a peptide's T-cell reactivity (and thus immunogenicity). The identification and characterization of important positions influencing T-cell reactivity will provide insights into the underlying mechanism of immunogenicity. Results This work establishes a large dataset by collecting immunogenicity data from three major immunology databases. In order to consider the effect of MHC restriction, peptides are classified by their associated MHC alleles. Subsequently, a computational method (named POPISK) using support vector machine with a weighted degree string kernel is proposed to predict T-cell reactivity and identify important recognition positions. POPISK yields a mean 10-fold cross-validation accuracy of 68% in predicting T-cell reactivity of HLA-A2-binding peptides. POPISK is capable of predicting immunogenicity with scores that can also correctly predict the change in T-cell reactivity related to point mutations in epitopes reported in previous studies using crystal structures. Thorough analyses of the prediction results identify the important positions 4, 6, 8 and 9, and yield insights into the molecular basis for TCR recognition. Finally, we relate this finding to physicochemical properties and structural features of the MHC-peptide-TCR interaction. Conclusions A computational method POPISK is proposed to predict immunogenicity with scores which are useful for predicting immunogenicity changes made by single-residue modifications. The web server of POPISK is freely available at http://iclab.life.nctu.edu.tw/POPISK.
Collapse
Affiliation(s)
- Chun-Wei Tung
- School of Pharmacy, Kaohsiung Medical University, Kaohsiung 807, Taiwan
| | | | | | | | | |
Collapse
|
16
|
Bowman BN, McAdam PR, Vivona S, Zhang JX, Luong T, Belew RK, Sahota H, Guiney D, Valafar F, Fierer J, Woelk CH. Improving reverse vaccinology with a machine learning approach. Vaccine 2011; 29:8156-64. [PMID: 21864619 DOI: 10.1016/j.vaccine.2011.07.142] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2011] [Revised: 07/19/2011] [Accepted: 07/28/2011] [Indexed: 11/27/2022]
Abstract
Reverse vaccinology aims to accelerate subunit vaccine design by rapidly predicting which proteins in a pathogenic bacterial proteome are putative protective antigens. Support vector machine classification is a machine learning approach that has been applied to solve numerous classification problems in biological sciences but has not previously been incorporated into a reverse vaccinology approach. A training data set of 136 bacterial protective antigens paired with 136 non-antigens was constructed and bioinformatic tools were used to annotate this data for predicted protein features, many of which are associated with antigenicity (i.e. extracellular localization, signal peptides and B-cell epitopes). Annotation was used to train support vector machine classifiers that exhibited a maximum accuracy of 92% for discriminating protective antigens from non-antigens as assessed by a leave-tenth-out cross-validation approach. These accuracies were superior to those achieved when annotating training data with auto and cross covariance transformations of z-descriptors for hydrophobicity, molecular size and polarity, or when classification was performed using regression methods. To further validate support vector machine classifiers, they were used to rank all the proteins in six bacterial proteomes for their antigenicity. Protective antigens from the training data were significantly recalled (enriched) in the top 75 ranked proteins for all six proteomes as assessed by a Fisher's exact test (p<0.05). This paper describes a superior workflow for performing reverse vaccinology studies and provides a benchmark training data set that can be used to evaluate future methodological improvements.
Collapse
Affiliation(s)
- Brett N Bowman
- Bioinformatics and Medical Informatics, San Diego State University, San Diego, CA 92182, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
17
|
Knapp B, Giczi V, Ribarics R, Schreiner W. PeptX: using genetic algorithms to optimize peptides for MHC binding. BMC Bioinformatics 2011; 12:241. [PMID: 21679477 PMCID: PMC3225262 DOI: 10.1186/1471-2105-12-241] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2011] [Accepted: 06/17/2011] [Indexed: 11/18/2022] Open
Abstract
Background The binding between the major histocompatibility complex and the presented peptide is an indispensable prerequisite for the adaptive immune response. There is a plethora of different in silico techniques for the prediction of the peptide binding affinity to major histocompatibility complexes. Most studies screen a set of peptides for promising candidates to predict possible T cell epitopes. In this study we ask the question vice versa: Which peptides do have highest binding affinities to a given major histocompatibility complex according to certain in silico scoring functions? Results Since a full screening of all possible peptides is not feasible in reasonable runtime, we introduce a heuristic approach. We developed a framework for Genetic Algorithms to optimize peptides for the binding to major histocompatibility complexes. In an extensive benchmark we tested various operator combinations. We found that (1) selection operators have a strong influence on the convergence of the population while recombination operators have minor influence and (2) that five different binding prediction methods lead to five different sets of "optimal" peptides for the same major histocompatibility complex. The consensus peptides were experimentally verified as high affinity binders. Conclusion We provide a generalized framework to calculate sets of high affinity binders based on different previously published scoring functions in reasonable runtime. Furthermore we give insight into the different behaviours of operators and scoring functions of the Genetic Algorithm.
Collapse
Affiliation(s)
- Bernhard Knapp
- Center for Medical Statistics, Informatics and Intelligent Systems, Department for Biosimulation and Bioinformatics, Medical University of Vienna, Austria.
| | | | | | | |
Collapse
|
18
|
Gupta SK, Srivastava M, Akhoon BA, Smita S, Schmitz U, Wolkenhauer O, Vera J, Gupta SK. Identification of immunogenic consensus T-cell epitopes in globally distributed influenza-A H1N1 neuraminidase. INFECTION GENETICS AND EVOLUTION 2010; 11:308-19. [PMID: 21094280 DOI: 10.1016/j.meegid.2010.10.013] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2010] [Revised: 10/15/2010] [Accepted: 10/18/2010] [Indexed: 02/01/2023]
Abstract
Antigenic drift is the ability of the swine influenza virus to undergo continuous and progressive changes in response to the host immune system. These changes dictate influenza vaccine updates annually to ensure inclusion of antigens of the most current strains. The identification of those peptides that stimulate T-cell responses, termed T-cell epitopes, is essential for the development of successful vaccines. In this study, the highly conserved and specific epitopes from neuraminidase of globally distributed H1N1 strains were predicted so that these potential vaccine candidates may escape with antigenic drift. A total of nine novel CD8(+) T-cell epitopes for MHC class-I and eight novel CD4(+) T-cell epitopes for MHC class-II alleles were proposed as novel epitope based vaccine candidates. Additionally, the epitope FSYKYGNGV was identified as a highly conserved, immunogenic and potential vaccine candidate, capable for generating both CD8(+) and CD4(+) responses.
Collapse
Affiliation(s)
- Shishir K Gupta
- Society for Biological Research & Rural Development, Lucknow, UP, India.
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Xi L, Li S, Liu H, Li J, Lei B, Yao X. Global and local prediction of protein folding rates based on sequence autocorrelation information. J Theor Biol 2010; 264:1159-68. [DOI: 10.1016/j.jtbi.2010.03.042] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2009] [Revised: 03/28/2010] [Accepted: 03/29/2010] [Indexed: 11/24/2022]
|
20
|
Joosten SA, van Meijgaarden KE, van Weeren PC, Kazi F, Geluk A, Savage NDL, Drijfhout JW, Flower DR, Hanekom WA, Klein MR, Ottenhoff THM. Mycobacterium tuberculosis peptides presented by HLA-E molecules are targets for human CD8 T-cells with cytotoxic as well as regulatory activity. PLoS Pathog 2010; 6:e1000782. [PMID: 20195504 PMCID: PMC2829052 DOI: 10.1371/journal.ppat.1000782] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2009] [Accepted: 01/20/2010] [Indexed: 12/31/2022] Open
Abstract
Tuberculosis (TB) is an escalating global health problem and improved vaccines against TB are urgently needed. HLA-E restricted responses may be of interest for vaccine development since HLA-E displays very limited polymorphism (only 2 coding variants exist), and is not down-regulated by HIV-infection. The peptides from Mycobacterium tuberculosis (Mtb) potentially presented by HLA-E molecules, however, are unknown. Here we describe human T-cell responses to Mtb-derived peptides containing predicted HLA-E binding motifs and binding-affinity for HLA-E. We observed CD8(+) T-cell proliferation to the majority of the 69 peptides tested in Mtb responsive adults as well as in BCG-vaccinated infants. CD8(+) T-cells were cytotoxic against target-cells transfected with HLA-E only in the presence of specific peptide. These T cells were also able to lyse M. bovis BCG infected, but not control monocytes, suggesting recognition of antigens during mycobacterial infection. In addition, peptide induced CD8(+) T-cells also displayed regulatory activity, since they inhibited T-cell proliferation. This regulatory activity was cell contact-dependent, and at least partly dependent on membrane-bound TGF-beta. Our results significantly increase our understanding of the human immune response to Mtb by identification of CD8(+) T-cell responses to novel HLA-E binding peptides of Mtb, which have cytotoxic as well as immunoregulatory activity.
Collapse
Affiliation(s)
- Simone A. Joosten
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | | | - Pascale C. van Weeren
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Fatima Kazi
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Annemieke Geluk
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Nigel D. L. Savage
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Jan W. Drijfhout
- Department of Immunohematology and Blood Transfusion, Leiden University Medical Center, Leiden, The Netherlands
| | - Darren R. Flower
- The Jenner Institute, University of Oxford, Oxford, United Kingdom
| | - Willem A. Hanekom
- South African Tuberculosis Vaccine Initiative, School of Child and Adolescent Health, University of Cape Town, Cape Town, South Africa
| | - Michèl R. Klein
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| | - Tom H. M. Ottenhoff
- Department of Infectious Diseases, Leiden University Medical Center, Leiden, The Netherlands
| |
Collapse
|
21
|
Lei B, Li S, Xi L, Li J, Liu H, Yao X. Novel approaches for retention time prediction of oligonucleotides in ion-pair reversed-phase high-performance liquid chromatography. J Chromatogr A 2009; 1216:4434-9. [PMID: 19324364 DOI: 10.1016/j.chroma.2009.03.032] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2008] [Revised: 03/09/2009] [Accepted: 03/13/2009] [Indexed: 10/21/2022]
Abstract
The base sequence autocorrelation (BSA) descriptors were used to describe structures of oligonucleotides and to develop accurate quantitative structure-retention relationship (QSRR) models of oligonucleotides in ion-pair reversed-phase high-performance liquid chromatography. Through the combination use of multiple linear regression (MLR) and genetic algorithm (GA), QSRR models were developed at temperatures of 30 degrees C, 40 degrees C, 50 degrees C, 60 degrees C and 80 degrees C, respectively. Satisfactory results were obtained for the single-temperature models (STM). Multi-temperature model (MTM) was also developed that can be used for predicting the retention time at any temperature. The correlation coefficients of retention time prediction for the test set based on the MTM model at 30 degrees C, 40 degrees C, 50 degrees C, 60 degrees C and 80 degrees C were 0.978, 0.982, 0.989, 0.988 and 0.996, respectively. The corresponding absolute average relative deviations (AARD) for the test set at each temperature were all less than 1%. The new strategy of feature representation and multi-temperatures modeling is a very promising tool for QSRR modeling with good predictive ability for predicting retention time of oligonucleotides at multiple temperatures under the studied condition.
Collapse
Affiliation(s)
- Beilei Lei
- Department of Chemistry, Lanzhou University, Lanzhou 730000, China
| | | | | | | | | | | |
Collapse
|
22
|
Gaussian process: an alternative approach for QSAM modeling of peptides. Amino Acids 2009; 38:199-212. [DOI: 10.1007/s00726-008-0228-1] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2008] [Accepted: 12/18/2008] [Indexed: 10/21/2022]
|
23
|
Tian F, Yang L, Lv F, Yang Q, Zhou P. In silico quantitative prediction of peptides binding affinity to human MHC molecule: an intuitive quantitative structure-activity relationship approach. Amino Acids 2008; 36:535-54. [PMID: 18575802 DOI: 10.1007/s00726-008-0116-8] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2008] [Accepted: 06/02/2008] [Indexed: 10/21/2022]
Abstract
In this paper, we have handpicked 23 kinds of electronic properties, 37 kinds of steric properties, 54 kinds of hydrophobic properties and 5 kinds of hydrogen bond properties from thousands of amino acid structural and property parameters. Principal component analysis (PCA) was applied on these parameters and thus ten score vectors involving significant nonbonding properties of 20 coded amino acids were yielded, called the divided physicochemical property scores (DPPS) of amino acids. The DPPS descriptor was then used to characterize the structures of 152 HLA-A*0201-restricted CTL epitopes, and significant variables being responsible for the binding affinities were selected by genetic algorithm, and a quantitative structure-activity relationship (QSAR) model by partial least square was established to predict the peptide-HLA-A*0201 molecule interactions. Statistical analysis on the resulted DPPS-based QSAR models were consistent well with experimental exhibits and molecular graphics display. Diversified properties of the different residues in binding peptides may contribute remarkable effect to the interactions between the HLA-A*0201 molecule and its peptide ligands. Particularly, hydrophobicity and hydrogen bond of anchor residues of peptides may have a significant contribution to the interactions. The results showed that DPPS can well represent the structural characteristics of the antigenic peptides and is a promising approach to predict the affinities of peptide binding to HLA-A*0201 in a efficient and intuitive way. We expect that this physical-principle based method can be applied to other protein-peptide interactions as well.
Collapse
Affiliation(s)
- F Tian
- Research Institute of Surgery, Daping Hospital, Third Military Medical University, Chongqing, China
| | | | | | | | | |
Collapse
|
24
|
Ivanciuc O, Braun W. Robust quantitative modeling of peptide binding affinities for MHC molecules using physical-chemical descriptors. Protein Pept Lett 2008; 14:903-16. [PMID: 18045233 DOI: 10.2174/092986607782110257] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Major histocompatibility complex (MHC) molecules bind short peptides resulting from intracellular processing of foreign and self proteins, and present them on the cell surface for recognition by T-cell receptors. We propose a new robust approach to quantitatively model the binding affinities of MHC molecules by quantitative structure-activity relationships (QSAR) that use the physical-chemical amino acid descriptors E1-E5. These QSAR models are robust, sequence-based, and can be used as a fast and reliable filter to predict the MHC binding affinity for large protein databases.
Collapse
Affiliation(s)
- Ovidiu Ivanciuc
- Sealy Center for Structural Biology and Molecular Biophysics, Department of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Boulevard, Galveston, Texas 77555-0857, USA
| | | |
Collapse
|
25
|
A comprehensive analysis of the thermodynamic events involved in ligand–receptor binding using CoRIA and its variants. J Comput Aided Mol Des 2008; 22:91-104. [DOI: 10.1007/s10822-008-9172-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2007] [Accepted: 01/05/2008] [Indexed: 10/22/2022]
|
26
|
Ray S, Kepler TB. Amino acid biophysical properties in the statistical prediction of peptide-MHC class I binding. Immunome Res 2007; 3:9. [PMID: 17967170 PMCID: PMC2186325 DOI: 10.1186/1745-7580-3-9] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2007] [Accepted: 10/29/2007] [Indexed: 11/10/2022] Open
Abstract
Background A key step in the development of an adaptive immune response to pathogens or vaccines is the binding of short peptides to molecules of the Major Histocompatibility Complex (MHC) for presentation to T lymphocytes, which are thereby activated and differentiate into effector and memory cells. The rational design of vaccines consists in part in the identification of appropriate peptides to effect this process. There are several algorithms currently in use for making such predictions, but these are limited to a small number of MHC molecules and have good but imperfect prediction power. Results We have undertaken an exploration of the power gained by taking advantage of a natural representation of the amino acids in terms of their biophysical properties. We used several well-known statistical classifiers using either a naive encoding of amino acids by name or an encoding by biophysical properties. In all cases, the encoding by biophysical properties leads to substantially lower misclassification error. Conclusion Representation of amino acids using a few important bio-physio-chemical property provide a natural basis for representing peptides and greatly improves peptide-MHC class I binding prediction.
Collapse
Affiliation(s)
- Surajit Ray
- Department of Mathematics and Statistics, Boston University, Boston, MA, USA.
| | | |
Collapse
|
27
|
Tian F, Zhou P, Lv F, Song R, Li Z. Three-dimensional holograph vector of atomic interaction field (3D-HoVAIF): a novel rotation-translation invariant 3D structure descriptor and its applications to peptides. J Pept Sci 2007; 13:549-66. [PMID: 17654623 DOI: 10.1002/psc.892] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Quantitative structure-activity relationship (QSAR) study, important in drug design, mainly involves two aspects, molecular structural characterization (MSC) and construction of a statistical model. MSC focuses on transforming molecular structural and property characteristics into a group of numerical codes, dedicated to minimizing information loss during this process. In this context, common atoms in organic compounds are classified according to their families in the periodic table, and hybridization states, and on the basis of these, three nonbonding interactions (i.e. electrostatic, van der Waals and hydrophobic) are calculated, ultimately resulting in a new rotation-translation invariant, 3D-MSC, as a three-dimensional holograph vector of atomic interaction field (3D-HoVAIF). By applying 3D-HoVAIF to QSAR studies on two classical peptides including 58 angiotensin-converting enzyme (ACE) inhibitors and 48 bitter-tasting dipeptides, we get two excellent genetic algorithm-partial least squares (GA-PLS) models, with statistics r(2), q(2), root mean square error (RMSEE), and root mean square error of cross-validation (RMSCV) of 0.857, 0.811, 0.376, and 0.432 for ACE inhibitors and 0.940, 0.892, 0.153 and 0.205 for bitter-tasting dipeptides, respectively. By equally dividing the two datasets into training and test sets by D-optimal, the 3D-HoVAIF approach undergoes rigorous statistical validation. Furthermore, the superior performance of 3D-HoVAIF is confirmed in comparison with two other peptide MSC approaches referring to z-scale and ISA-ECI. For 58 ACE inhibitors, the GA-PLS model yields two principal components, with the following statistics: r(2) = 0.893, q(2) = 0.824, RMSEE = 0.349, RMSCV = 0.425, q2(ext) = 0.739, r2(ext)= 0.784, r2(0.ext) = 0.781, rf2(0.ext) = 0.77, k = 0.962, k' = 1.019, and RMSEP = 0.460; for 48 bitter-tasting dipeptides, three principal components resulted, with the statistics as: r(2) = 0.950, q(2) = 0.893, RMSEE = 0.152, RMSCV = 0.222, q2(ext)= 0.875, r2(ext) = 0.919, r2(0.ext)= 0.919, rf2(0.ext)= 0.919, k = 1.018, k' = 0.974, and RMSEP = 0.198. In addition, the relationship of ACE-inhibiting activities with bitter-tasting thresholds has been investigated by applying the above-constructed models to predictions on 400 theoretically possible dipeptides. Through analysis, the ACE-inhibiting activities are found to be prominently related to bitter-tasting intensities. Thus, it is deemed to be difficult to find such dipeptides that simultaneously satisfy pharmacodynamic action (high ACE-inhibiting activities) and comfortable tastes, suggesting that active components of dipeptides that are served as functional food to lower blood pressure are not very ideal.
Collapse
Affiliation(s)
- Feifei Tian
- College of Bioengineering, Chongqing University, Chongqing 40044, China
| | | | | | | | | |
Collapse
|
28
|
Artificial neural network models for prediction of intestinal permeability of oligopeptides. BMC Bioinformatics 2007; 8:245. [PMID: 17623108 PMCID: PMC1955455 DOI: 10.1186/1471-2105-8-245] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2007] [Accepted: 07/11/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Oral delivery is a highly desirable property for candidate drugs under development. Computational modeling could provide a quick and inexpensive way to assess the intestinal permeability of a molecule. Although there have been several studies aimed at predicting the intestinal absorption of chemical compounds, there have been no attempts to predict intestinal permeability on the basis of peptide sequence information. To develop models for predicting the intestinal permeability of peptides, we adopted an artificial neural network as a machine-learning algorithm. The positive control data consisted of intestinal barrier-permeable peptides obtained by the peroral phage display technique, and the negative control data were prepared from random sequences. RESULTS The capacity of our models to make appropriate predictions was validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). The training and test set statistics indicated that our models were of strikingly good quality and could discriminate between permeable and random sequences with a high level of confidence. CONCLUSION We developed artificial neural network models to predict the intestinal permeabilities of oligopeptides on the basis of peptide sequence information. Both binary and VHSE (principal components score Vectors of Hydrophobic, Steric and Electronic properties) descriptors produced statistically significant training models; the models with simple neural network architectures showed slightly greater predictive power than those with complex ones. We anticipate that our models will be applicable to the selection of intestinal barrier-permeable peptides for generating peptide drugs or peptidomimetics.
Collapse
|
29
|
Fernández M, Caballero J. Analysis of protegrin structure–activity relationships: the structural characteristics important for antimicrobial activity using smoothed amino acid sequence descriptors. MOLECULAR SIMULATION 2007. [DOI: 10.1080/08927020701236771] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
|
30
|
Lu Y, Bulka B, desJardins M, Freeland SJ. Amino acid quantitative structure property relationship database: a web-based platform for quantitative investigations of amino acids. Protein Eng Des Sel 2007; 20:347-51. [PMID: 17557765 DOI: 10.1093/protein/gzm027] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Here, we present the AA-QSPR Db (Amino Acid Quantitative Structure Property Relationship Database): a novel, freely available web-resource of data pertaining to amino acids, both engineered and naturally occurring. In addition to presenting fundamental molecular descriptors of size, charge and hydrophobicity, it also includes online visualization tools for users to perform instant, interactive analyses of amino acid sub-sets in which they are interested. The database has been designed with extensible markup language technology to provide a flexible structure, suitable for future development. In addition to providing easy access for queries by external computers, it also offers a user-friendly web-based interface that facilitates human interactions (submission, storage and retrieval of amino acid data) and an associated e-forum that encourages users to question and discuss current and future database contents.
Collapse
Affiliation(s)
- Yi Lu
- Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD 21250, USA
| | | | | | | |
Collapse
|
31
|
Holm L, Frech K, Dzhambazov B, Holmdahl R, Kihlberg J, Linusson A. Quantitative Structure−Activity Relationship of Peptides Binding to the Class II Major Histocompatibility Complex Molecule Aq Associated with Autoimmune Arthritis. J Med Chem 2007; 50:2049-59. [PMID: 17425295 DOI: 10.1021/jm061209b] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]
Abstract
Presentation of (glyco)peptides by the class II major histocompatibility complex molecule Aq to T cells plays a central role in collagen-induced arthritis, an animal model for the autoimmune disease rheumatoid arthritis. A peptide library was designed using statistical molecular design in amino acid space in which five positions in the minimal mouse collagen type II binding epitope CII260-267 were varied. A substantially reduced peptide library of 24 peptides with diverse and representative molecular characteristics was selected, synthesized, and evaluated for the binding strength to Aq. A multivariate QSAR model was established by correlating calculated descriptors, compressed to its principle properties, with the binding data using partial least-square regression. The model was successfully validated by an external test set. Interpretation of the model provided a molecular property binding motif for peptides interacting with Aq. The information may be useful in future research directed toward new treatments of rheumatoid arthritis.
Collapse
Affiliation(s)
- Lotta Holm
- Department of Chemistry, Umeå University, SE-901 87 Umeå, Sweden
| | | | | | | | | | | |
Collapse
|
32
|
Pissurlenkar R, Malde A, Khedkar S, Coutinho E. Encoding Type and Position in Peptide QSAR: Application to Peptides Binding to Class I MHC Molecule HLA-A*0201. ACTA ACUST UNITED AC 2007. [DOI: 10.1002/qsar.200530184] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
33
|
VaxiJen: a server for prediction of protective antigens, tumour antigens and subunit vaccines. BMC Bioinformatics 2007; 8:4. [PMID: 17207271 PMCID: PMC1780059 DOI: 10.1186/1471-2105-8-4] [Citation(s) in RCA: 1572] [Impact Index Per Article: 92.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2006] [Accepted: 01/05/2007] [Indexed: 11/25/2022] Open
Abstract
Background Vaccine development in the post-genomic era often begins with the in silico screening of genome information, with the most probable protective antigens being predicted rather than requiring causative microorganisms to be grown. Despite the obvious advantages of this approach – such as speed and cost efficiency – its success remains dependent on the accuracy of antigen prediction. Most approaches use sequence alignment to identify antigens. This is problematic for several reasons. Some proteins lack obvious sequence similarity, although they may share similar structures and biological properties. The antigenicity of a sequence may be encoded in a subtle and recondite manner not amendable to direct identification by sequence alignment. The discovery of truly novel antigens will be frustrated by their lack of similarity to antigens of known provenance. To overcome the limitations of alignment-dependent methods, we propose a new alignment-free approach for antigen prediction, which is based on auto cross covariance (ACC) transformation of protein sequences into uniform vectors of principal amino acid properties. Results Bacterial, viral and tumour protein datasets were used to derive models for prediction of whole protein antigenicity. Every set consisted of 100 known antigens and 100 non-antigens. The derived models were tested by internal leave-one-out cross-validation and external validation using test sets. An additional five training sets for each class of antigens were used to test the stability of the discrimination between antigens and non-antigens. The models performed well in both validations showing prediction accuracy of 70% to 89%. The models were implemented in a server, which we call VaxiJen. Conclusion VaxiJen is the first server for alignment-independent prediction of protective antigens. It was developed to allow antigen classification solely based on the physicochemical properties of proteins without recourse to sequence alignment. The server can be used on its own or in combination with alignment-based prediction methods. It is freely-available online at the URL: .
Collapse
|
34
|
Hattotuwagama CK, Flower DR. Empirical prediction of peptide octanol-water partition coefficients. Bioinformation 2006; 1:257-9. [PMID: 17597903 PMCID: PMC1891700 DOI: 10.6026/97320630001257] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2006] [Accepted: 11/22/2006] [Indexed: 11/23/2022] Open
Abstract
Peptides are of great therapeutic potential as vaccines and drugs. Knowledge of physicochemical descriptors, including the partition coefficient P (commonly expressed in logarithm form: logP), is useful for screening out unsuitable molecules and also for the development of predictive Quantitative Structure-Activity Relationships (QSARs). In this paper we develop a new approach to the prediction of LogP values for peptides based on an empirical relationship between global molecular properties and measured physical properties. Our method was successful in terms of peptide prediction (total r(2) = 0.641). The final model consisted of 5 physicochemical descriptors (molecular weight, number of single bonds, 2D-VDW volume, 2D-VSA hydrophobic and 2D-VSA polar). The approach is peptide specific and its predictive accuracy was high. Overall, 67% of the peptides were able to be predicted within +/-0.5 log units from the experimental values. Our method thus represents a novel prediction method with proven predictive ability.
Collapse
Affiliation(s)
| | - Darren R Flower
- Darren R Flower
E-mail:
Phone: +44 1635 577954; Fax: +44 1635 577908; Corresponding author
| |
Collapse
|
35
|
Thompson SJ, Hattotuwagama CK, Holliday JD, Flower DR. On the hydrophobicity of peptides: Comparing empirical predictions of peptide log P values. Bioinformation 2006; 1:237-41. [PMID: 17597897 PMCID: PMC1891704 DOI: 10.6026/97320630001237] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2006] [Accepted: 11/02/2006] [Indexed: 11/23/2022] Open
Abstract
Peptides are of great therapeutic potential as vaccines and drugs. Knowledge of physicochemical descriptors, including the partition coefficient logP, is useful for the development of predictive Quantitative Structure-Activity Relationships (QSARs). We have investigated the accuracy of available programs for the prediction of logP values for peptides with known experimental values obtained from the literature. Eight prediction programs were tested, of which seven programs were fragment-based methods: XLogP, LogKow, PLogP, ACDLogP, AlogP, Interactive Analysis's LogP and MlogP; and one program used a whole molecule approach: QikProp. The predictive accuracy of the programs was assessed using r(2) values, with ALogP being the most effective (r( 2) = 0.822) and MLogP the least (r(2) = 0.090). We also examined three distinct types of peptide structure: blocked, unblocked, and cyclic. For each study (all peptides, blocked, unblocked and cyclic peptides) the performance of programs rated from best to worse is as follows: all peptides - ALogP, QikProp, PLogP, XLogP, IALogP, LogKow, ACDLogP, and MlogP; blocked peptides - PLogP, XLogP, ACDLogP, IALogP, LogKow, QikProp, ALogP, and MLogP; unblocked peptides - QikProp, IALogP, ALogP, ACDLogP, MLogP, XLogP, LogKow and PLogP; cyclic peptides - LogKow, ALogP, XLogP, MLogP, QikProp, ACDLogP, IALogP. In summary, all programs gave better predictions for blocked peptides, while, in general, logP values for cyclic peptides were under-predicted and those of unblocked peptides were over-predicted.
Collapse
Affiliation(s)
- Sarah J Thompson
- Edward Jenner Institute for Vaccine Research, High Street, Compton, Berkshire, RG20 7NN, UK
- Dept. of Information Studies, University of Sheffield
| | - Channa K Hattotuwagama
- Edward Jenner Institute for Vaccine Research, High Street, Compton, Berkshire, RG20 7NN, UK
| | | | - Darren R Flower
- Edward Jenner Institute for Vaccine Research, High Street, Compton, Berkshire, RG20 7NN, UK
- Darren R Flower
E-mail:
; Phone: +44 1635 577954, Fax: +44 1635 577908; Corresponding author
| |
Collapse
|
36
|
Salomon J, Flower DR. Predicting Class II MHC-Peptide binding: a kernel based approach using similarity scores. BMC Bioinformatics 2006; 7:501. [PMID: 17105666 PMCID: PMC1664591 DOI: 10.1186/1471-2105-7-501] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2006] [Accepted: 11/14/2006] [Indexed: 12/22/2022] Open
Abstract
Background Modelling the interaction between potentially antigenic peptides and Major Histocompatibility Complex (MHC) molecules is a key step in identifying potential T-cell epitopes. For Class II MHC alleles, the binding groove is open at both ends, causing ambiguity in the positional alignment between the groove and peptide, as well as creating uncertainty as to what parts of the peptide interact with the MHC. Moreover, the antigenic peptides have variable lengths, making naive modelling methods difficult to apply. This paper introduces a kernel method that can handle variable length peptides effectively by quantifying similarities between peptide sequences and integrating these into the kernel. Results The kernel approach presented here shows increased prediction accuracy with a significantly higher number of true positives and negatives on multiple MHC class II alleles, when testing data sets from MHCPEP [1], MCHBN [2], and MHCBench [3]. Evaluation by cross validation, when segregating binders and non-binders, produced an average of 0.824 AROC for the MHCBench data sets (up from 0.756), and an average of 0.96 AROC for multiple alleles of the MHCPEP database. Conclusion The method improves performance over existing state-of-the-art methods of MHC class II peptide binding predictions by using a custom, knowledge-based representation of peptides. Similarity scores, in contrast to a fixed-length, pocket-specific representation of amino acids, provide a flexible and powerful way of modelling MHC binding, and can easily be applied to other dynamic sequence problems.
Collapse
Affiliation(s)
- Jesper Salomon
- The Jenner Institute, University of Oxford, Compton, Newbury, Berkshire, RG20 7NN, UK
| | - Darren R Flower
- The Jenner Institute, University of Oxford, Compton, Newbury, Berkshire, RG20 7NN, UK
| |
Collapse
|