1
|
Nedyalkova M, Vasighi M, Azmoon A, Naneva L, Simeonov V. Sequence-Based Prediction of Plant Allergenic Proteins: Machine Learning Classification Approach. ACS OMEGA 2023; 8:3698-3704. [PMID: 36743013 PMCID: PMC9893444 DOI: 10.1021/acsomega.2c02842] [Citation(s) in RCA: 4] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/10/2022] [Accepted: 11/21/2022] [Indexed: 06/18/2023]
Abstract
This Article proposes a novel chemometric approach to understanding and exploring the allergenic nature of food proteins. Using machine learning methods (supervised and unsupervised), this work aims to predict the allergenicity of plant proteins. The strategy is based on scoring descriptors and testing their classification performance. Partitioning was based on support vector machines (SVM), and a k-nearest neighbor (KNN) classifier was applied. A fivefold cross-validation approach was used to validate the KNN classifier in the variable selection step as well as the final classifier. To overcome the problem of food allergies, a robust and efficient method for protein classification is needed.
Collapse
Affiliation(s)
- Miroslava Nedyalkova
- Faculty
of Chemistry and Pharmacy, Inorganic Chemistry, University of Sofia, 1172Sofia, Bulgaria
- Department
of Chemistry, University of Fribourg, Chemin de Muse 9, CH-1700Fribourg, Switzerland
| | - Mahdi Vasighi
- Department
of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan45137, Iran
| | - Amirreza Azmoon
- Department
of Computer Science and Information Technology, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan45137, Iran
| | | | - Vasil Simeonov
- Department
of Inorganic Chemistry, University of Sofia, 1172Sofia, Bulgaria
| |
Collapse
|
2
|
Wang L, Niu D, Zhao X, Wang X, Hao M, Che H. A Comparative Analysis of Novel Deep Learning and Ensemble Learning Models to Predict the Allergenicity of Food Proteins. Foods 2021; 10:809. [PMID: 33918556 PMCID: PMC8069377 DOI: 10.3390/foods10040809] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2021] [Revised: 04/02/2021] [Accepted: 04/06/2021] [Indexed: 11/16/2022] Open
Abstract
Traditional food allergen identification mainly relies on in vivo and in vitro experiments, which often needs a long period and high cost. The artificial intelligence (AI)-driven rapid food allergen identification method has solved the above mentioned some drawbacks and is becoming an efficient auxiliary tool. Aiming to overcome the limitations of lower accuracy of traditional machine learning models in predicting the allergenicity of food proteins, this work proposed to introduce deep learning model-transformer with self-attention mechanism, ensemble learning models (representative as Light Gradient Boosting Machine (LightGBM) eXtreme Gradient Boosting (XGBoost)) to solve the problem. In order to highlight the superiority of the proposed novel method, the study also selected various commonly used machine learning models as the baseline classifiers. The results of 5-fold cross-validation showed that the area under the receiver operating characteristic curve (AUC) of the deep model was the highest (0.9578), which was better than the ensemble learning and baseline algorithms. But the deep model need to be pre-trained, and the training time is the longest. By comparing the characteristics of the transformer model and boosting models, it can be analyzed that, each model has its own advantage, which provides novel clues and inspiration for the rapid prediction of food allergens in the future.
Collapse
Affiliation(s)
- Liyang Wang
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| | - Dantong Niu
- College of Information and Electrical Engineering, China Agricultural University, Beijing 100083, China;
| | - Xinjie Zhao
- College of Humanities and Development Studies, China Agricultural University, Beijing 100083, China;
| | - Xiaoya Wang
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| | - Mengzhen Hao
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| | - Huilian Che
- Key Laboratory of Precision Nutrition and Food Quality, The Ministry of Education, College of Food Science and Nutritional Engineering, China Agricultural University, Beijing 100083, China; (L.W.); (X.W.); (M.H.)
| |
Collapse
|
3
|
Saravanan V, Lakshmi PTV. Fuzzy Logic for Personalized Healthcare and Diagnostics: FuzzyApp—A Fuzzy Logic Based Allergen-Protein Predictor. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:570-81. [DOI: 10.1089/omi.2014.0021] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Vijayakumar Saravanan
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India
| | - PTV Lakshmi
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India
| |
Collapse
|
4
|
Goodman RE, Hefle SL. Gaining perspective on the allergenicity assessment of genetically modified food crops. Expert Rev Clin Immunol 2014; 1:561-78. [DOI: 10.1586/1744666x.1.4.561] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
5
|
Brusic V, Petrovsky N. Immunoinformatics and its relevance to understanding human immune disease. Expert Rev Clin Immunol 2014; 1:145-57. [DOI: 10.1586/1744666x.1.1.145] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
|
6
|
Bragin AO, Demenkov PS, Kolchanov NA, Ivanisenko VA. Accuracy of protein allergenicity prediction can be improved by taking into account data on allergenic protein discontinuous peptides. J Biomol Struct Dyn 2012; 31:59-64. [PMID: 22804354 DOI: 10.1080/07391102.2012.691362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Allergy poses major health problems in industrialized countries, affecting over 20% of the population. Proteins from transgenic foods, cosmetics, animal hair, and other ubiquitous sources can be allergens. For this reason, development of improved methods for the prediction of potential allergenicity of proteins is timely. The currently available approaches to allergenicity prediction are numerous. Some approaches relied heavily on information on protein three-dimensional (3D) structure for allergenicity prediction. They required knowledge about 3D structure of query protein, thereby considerably restricting analysis to only those proteins whose 3D structure was known. As a consequence, many proteins with unknown structure could be overlooked. We developed a new method for allergenicity prediction, using information on protein 3D structure only for training. Three-dimensional structures of known allergenic proteins were used for representing protein surface as patches designated as discontinuous peptides. Allergenicity was predicted through search of such peptides in query protein sequences. It was demonstrated that the information on the discontinuous peptides made feasible better prediction of allergenic proteins. The allergenicity prediction method is available at http://www-bionet.sscc.ru/psd/cgi-bin/programs/Allergen/allergen.cgi .
Collapse
Affiliation(s)
- Anatoly O Bragin
- Institute of Cytology and Genetics, Lavrentiev ave.10, Novosibirsk, 630090, Russia.
| | | | | | | |
Collapse
|
7
|
Zheng LN, Lin H, Pawar R, Li ZX, Li MH. Mapping IgE binding epitopes of major shrimp (Penaeus monodon) allergen with immunoinformatics tools. Food Chem Toxicol 2011; 49:2954-60. [DOI: 10.1016/j.fct.2011.07.043] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2011] [Revised: 07/09/2011] [Accepted: 07/14/2011] [Indexed: 11/30/2022]
|
8
|
Scientific Opinion on the assessment of allergenicity of GM plants and microorganisms and derived food and feed. EFSA J 2010. [DOI: 10.2903/j.efsa.2010.1700] [Citation(s) in RCA: 243] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
9
|
Tang ZQ, Lin HH, Zhang HL, Han LY, Chen X, Chen YZ. Prediction of functional class of proteins and peptides irrespective of sequence homology by support vector machines. Bioinform Biol Insights 2009; 1:19-47. [PMID: 20066123 PMCID: PMC2789692 DOI: 10.4137/bbi.s315] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Various computational methods have been used for the prediction of protein and peptide function based on their sequences. A particular challenge is to derive functional properties from sequences that show low or no homology to proteins of known function. Recently, a machine learning method, support vector machines (SVM), have been explored for predicting functional class of proteins and peptides from amino acid sequence derived properties independent of sequence similarity, which have shown promising potential for a wide spectrum of protein and peptide classes including some of the low- and non-homologous proteins. This method can thus be explored as a potential tool to complement alignment-based, clustering-based, and structure-based methods for predicting protein function. This article reviews the strategies, current progresses, and underlying difficulties in using SVM for predicting the functional class of proteins. The relevant software and web-servers are described. The reported prediction performances in the application of these methods are also presented.
Collapse
Affiliation(s)
- Zhi Qun Tang
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Hong Huang Lin
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Hai Lei Zhang
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Lian Yi Han
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
| | - Xin Chen
- Department of Biotechnology, Zhejiang University, Hang Zhou, Zhejiang Province, P. R. China, 310029
| | - Yu Zong Chen
- Department of Pharmacy and Department of Computational Science, National University of Singapore, Republic of Singapore, 117543
- Shanghai Center for Bioinformatics Technology, Shanghai, P. R. China, 201203
| |
Collapse
|
10
|
Hammerling U, Tallsjö A, Grafström R, Ilbäck NG. Comparative Hazard Characterization in Food Toxicology. Crit Rev Food Sci Nutr 2009; 49:626-69. [DOI: 10.1080/10408390802145617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
11
|
Muh HC, Tong JC, Tammi MT. AllerHunter: a SVM-pairwise system for assessment of allergenicity and allergic cross-reactivity in proteins. PLoS One 2009; 4:e5861. [PMID: 19516900 PMCID: PMC2689655 DOI: 10.1371/journal.pone.0005861] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2008] [Accepted: 05/06/2009] [Indexed: 11/19/2022] Open
Abstract
Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928+/-0.004 and Matthew's correlation coefficient MCC = 0.738), performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at (http://tiger.dbs.nus.edu.sg/AllerHunter).
Collapse
Affiliation(s)
- Hon Cheng Muh
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Joo Chuan Tong
- Data Mining Department, Institute for Infocomm Research, Singapore, Singapore
- Department of Biochemistry, National University of Singapore, Singapore, Singapore
| | - Martti T. Tammi
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, National University of Singapore, Singapore, Singapore
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
12
|
Lim SJ, Tong JC, Chew FT, Tammi MT. The value of position-specific scoring matrices for assessment of protein allegenicity. BMC Bioinformatics 2008; 9 Suppl 12:S21. [PMID: 19091021 PMCID: PMC2638161 DOI: 10.1186/1471-2105-9-s12-s21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bioinformatics tools are commonly used for assessing potential protein allergenicity. While these methods have achieved good accuracies for highly conserved sequences, they are less effective when the overall similarity is low. In this study, we assessed the feasibility of using position-specific scoring matrices as a basis for predicting potential allergenicity in proteins. RESULTS Two simple methods for predicting potential allergenicity in proteins, based on general and group-specific allergen profiles, are presented. Testing results indicate that the performances of both methods are comparable to the best results of other methods. The group-specific profile approach, with a sensitivity of 84.04% and specificity of 96.52%, gives similar results as those obtained using the general profile approach (sensitivity = 82.45%, specificity = 96.92%). CONCLUSION We show that position-specific scoring matrices are highly promising for constructing computational models suitable for allergenicity assessment. These data suggest it may be possible to apply a targeted approach for allergenicity assessment based on the profiles of allergens of interest.
Collapse
Affiliation(s)
- Shen Jean Lim
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore 117597.
| | | | | | | |
Collapse
|
13
|
Kumar KK, Shelokar PS. An SVM method using evolutionary information for the identification of allergenic proteins. Bioinformation 2008; 2:253-6. [PMID: 18317576 PMCID: PMC2258428 DOI: 10.6026/97320630002253] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2007] [Revised: 01/17/2008] [Accepted: 01/19/2008] [Indexed: 12/27/2022] Open
Abstract
This study presents an allergenic protein prediction system that appears to be capable of producing high sensitivity and specificity. The proposed system is based on support vector machine (SVM) using evolutionary information in the form of an amino acid position specific scoring matrix (PSSM). The performance of this system is assessed by a 10-fold cross-validation experiment using a dataset consisting of 693 allergens and 1041 non-allergens obtained from Swiss-Prot and Structural Database of Allergenic Proteins (SDAP). The PSSM method produced an accuracy of 90.1% in comparison to the methods based on SVM using amino acid, dipeptide composition, pseudo (5-tier) amino acid composition that achieved an accuracy of 86.3, 86.5 and 82.1% respectively. The results show that evolutionary information can be useful to build more effective and efficient allergen prediction systems.
Collapse
Affiliation(s)
- Kandaswamy Krishna Kumar
- Insilico Consulting, 402, Citi Centre, 39/2 Erandwane, Karve Road, Pune-411004, Maharashtra, India.
| | | |
Collapse
|
14
|
Martinez Barrio A, Soeria-Atmadja D, Nistér A, Gustafsson MG, Hammerling U, Bongcam-Rudloff E. EVALLER: a web server for in silico assessment of potential protein allergenicity. Nucleic Acids Res 2007; 35:W694-700. [PMID: 17537818 PMCID: PMC1933222 DOI: 10.1093/nar/gkm370] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics testing approaches for protein allergenicity, involving amino acid sequence comparisons, have evolved appreciably over the last several years to increased sophistication and performance. EVALLER, the web server presented in this article is based on our recently published 'Detection based on Filtered Length-adjusted Allergen Peptides' (DFLAP) algorithm, which affords in silico determination of potential protein allergenicity of high sensitivity and excellent specificity. To strengthen bioinformatics risk assessment in allergology EVALLER provides a comprehensive outline of its judgment on a query protein's potential allergenicity. Each such textual output incorporates a scoring figure, a confidence numeral of the assignment and information on high- or low-scoring matches to identified allergen-related motifs, including their respective location in accordingly derived allergens. The interface, built on a modified Perl Open Source package, enables dynamic and color-coded graphic representation of key parts of the output. Moreover, pertinent details can be examined in great detail through zoomed views. The server can be accessed at http://bioinformatics.bmc.uu.se/evaller.html.
Collapse
Affiliation(s)
- Alvaro Martinez Barrio
- Linnaeus Centre for Bioinformatics, Uppsala Biomedical Centre (BMC), Uppsala University, P.O. Box 598, SE-751 24 Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
15
|
Mari A, Scala E, Palazzo P, Ridolfi S, Zennaro D, Carabella G. Bioinformatics applied to allergy: allergen databases, from collecting sequence information to data integration. The Allergome platform as a model. Cell Immunol 2007; 244:97-100. [PMID: 17434469 DOI: 10.1016/j.cellimm.2007.02.012] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2007] [Accepted: 02/11/2007] [Indexed: 11/18/2022]
Abstract
Allergens are proteins or glycoproteins that are recognized by IgE produced by the immune system of allergic individuals. Until now around 1,500 allergenic structures have been identified and this number seems not have reached a plateau after 3-4 decades of research and the advent of molecular biology. Several allergen databases are available on Internet. Different aims and philosophies lead to different products. Here we report about main feature of web sites dedicated to allergens and we describe in more details our current work on the Allergome platform. The web server Allergome (www.allergome.org) represent a free independent open resource whose goal is to provide an exhaustive repository of data related to all the IgE-binding compounds. The main purpose of Allergome is to collect a list of allergenic sources and molecules by using the widest selection criteria and sources. A further development of the Allergome platform has been represented by the Real Time Monitoring of IgE sensitization module (ReTiME) that allows uploading of raw data from both in vivo and in vitro testing, thus representing the first attempt to have IT applied to allergy data mining. More recently, a new module (RefArray) representing a tool for literature mining has been released.
Collapse
Affiliation(s)
- Adriano Mari
- Allergy Data Laboratories sc, Via Malipiero 28, 04100 Latina, Italy.
| | | | | | | | | | | |
Collapse
|
16
|
Schein CH, Ivanciuc O, Braun W. Bioinformatics approaches to classifying allergens and predicting cross-reactivity. Immunol Allergy Clin North Am 2007; 27:1-27. [PMID: 17276876 PMCID: PMC1941676 DOI: 10.1016/j.iac.2006.11.005] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Allergenic proteins from very different environmental sources have similar sequences and structures. This fact may account for multiple allergen syndromes, whereby a myriad of diverse plants and foods may induce a similar IgE-based reaction in certain patients. Identifying the common triggering protein in these sources, in silico, can aid designing individualized therapy for allergen sufferers. This article provides an overview of databases on allergenic proteins, and ways to identify common proteins that may be the cause of multiple allergy syndromes. The major emphasis is on the relational Structural Database of Allergenic Proteins (SDAP []), which includes cross-referenced data on the sequence, structure, and IgE epitopes of over 800 allergenic proteins, coupled with specially developed bioinformatics tools to group all allergens and identify discrete areas that may account for cross-reactivity. SDAP is freely available on the Web to clinicians and patients.
Collapse
Affiliation(s)
- Catherine H. Schein
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Microbiology and Immunology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| | - Ovidiu Ivanciuc
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| | - Werner Braun
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| |
Collapse
|
17
|
Zhang ZH, Koh JLY, Zhang GL, Choo KH, Tammi MT, Tong JC. AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins. Bioinformatics 2006; 23:504-6. [PMID: 17150996 DOI: 10.1093/bioinformatics/btl621] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Assessment of potential allergenicity and patterns of cross-reactivity is necessary whenever novel proteins are introduced into human food chain. Current bioinformatic methods in allergology focus mainly on the prediction of allergenic proteins, with no information on cross-reactivity patterns among known allergens. In this study, we present AllerTool, a web server with essential tools for the assessment of predicted as well as published cross-reactivity patterns of allergens. The analysis tools include graphical representation of allergen cross-reactivity information; a local sequence comparison tool that displays information of known cross-reactive allergens; a sequence similarity search tool for assessment of cross-reactivity in accordance to FAO/WHO Codex alimentarius guidelines; and a method based on support vector machine (SVM). A 10-fold cross-validation results showed that the area under the receiver operating curve (A(ROC)) of SVM models is 0.90 with 86.00% sensitivity (SE) at specificity (SP) of 86.00%. AVAILABILITY AllerTool is freely available at http://research.i2r.a-star.edu.sg/AllerTool/.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613.
| | | | | | | | | | | |
Collapse
|
18
|
Aalberse RC, Stadler BM. In silico predictability of allergenicity: from amino acid sequence via 3-D structure to allergenicity. Mol Nutr Food Res 2006; 50:625-7. [PMID: 16764015 DOI: 10.1002/mnfr.200500270] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
In relation to the prediction of allergenicity three aspects have to be discussed: IgE immunogenicity, IgE cross-reactivity, and T-cell cross-reactivity. IgE immunogenicity depends largely on factors other than the protein itself: the context and dose and "history" of the protein by the time it reaches the immune system. It is, therefore, not fully predictable from structural information. In contrast, IgE cross-reactivity can be much more reliably assessed by in-silico homology searches in combination with in vitro IgE antibody assays. The in-silico homology search is unlikely to miss potential cross-reactivity with sequenced allergens. So far, no biologically relevant cross-reactivity at the antibody level has been demonstrated between proteins without easily demonstrable homology. T-cell cross-reactivity is much more difficult to predict than B-cell cross-reactivity. Moreover, its effects are more diverse. Yet, pre-existing cross-reactive T-cell activity is likely to influence the outcome not only of the immune response, but also of the effect phase of the allergic reaction. The question of whether any antigen can be allergenic is still a matter of debate.
Collapse
Affiliation(s)
- Rob C Aalberse
- Sanquin Research at CLB, Landsteiner Laboratory, Academic Medical Centre, Department of Immunopathology, Amsterdam, The Netherlands.
| | | |
Collapse
|
19
|
Abstract
In this study a systematic attempt has been made to integrate various approaches in order to predict allergenic proteins with high accuracy. The dataset used for testing and training consists of 578 allergens and 700 non-allergens obtained from A. K. Bjorklund, D. Soeria-Atmadja, A. Zorzet, U. Hammerling and M. G. Gustafsson (2005) Bioinformatics, 21, 39-50. First, we developed methods based on support vector machine using amino acid and dipeptide composition and achieved an accuracy of 85.02 and 84.00%, respectively. Second, a motif-based method has been developed using MEME/MAST software that achieved sensitivity of 93.94 with 33.34% specificity. Third, a database of known IgE epitopes was searched and this predicted allergenic proteins with 17.47% sensitivity at specificity of 98.14%. Fourth, we predicted allergenic proteins by performing BLAST search against allergen representative peptides. Finally hybrid approaches have been developed, which combine two or more than two approaches. The performance of all these algorithms has been evaluated on an independent dataset of 323 allergens and on 101 725 non-allergens obtained from Swiss-Prot. A web server AlgPred has been developed for the predicting allergenic proteins and for mapping IgE epitopes on allergenic proteins (http://www.imtech.res.in/raghava/algpred/). AlgPred is available at www.imtech.res.in/raghava/algpred/.
Collapse
Affiliation(s)
| | - G. P. S. Raghava
- To whom correspondence should be addressed. Tel: +91 172 2690557; Fax: +91 172 2690632;
| |
Collapse
|
20
|
Soeria-Atmadja D, Lundell T, Gustafsson MG, Hammerling U. Computational detection of allergenic proteins attains a new level of accuracy with in silico variable-length peptide extraction and machine learning. Nucleic Acids Res 2006; 34:3779-93. [PMID: 16977698 PMCID: PMC1540723 DOI: 10.1093/nar/gkl467] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The placing of novel or new-in-the-context proteins on the market, appearing in genetically modified foods, certain bio-pharmaceuticals and some household products leads to human exposure to proteins that may elicit allergic responses. Accurate methods to detect allergens are therefore necessary to ensure consumer/patient safety. We demonstrate that it is possible to reach a new level of accuracy in computational detection of allergenic proteins by presenting a novel detector, Detection based on Filtered Length-adjusted Allergen Peptides (DFLAP). The DFLAP algorithm extracts variable length allergen sequence fragments and employs modern machine learning techniques in the form of a support vector machine. In particular, this new detector shows hitherto unmatched specificity when challenged to the Swiss-Prot repository without appreciable loss of sensitivity. DFLAP is also the first reported detector that successfully discriminates between allergens and non-allergens occurring in protein families known to hold both categories. Allergenicity assessment for specific protein sequences of interest using DFLAP is possible via ulfh@slv.se.
Collapse
Affiliation(s)
| | | | - M. G. Gustafsson
- Department of Engineering Sciences, Uppsala UniversityPO Box 534, SE-751 21 Uppsala, Sweden
- Department of Genetics and Pathology, Uppsala University, Rudbeck LaboratorySE-751 85 Uppsala, Sweden
- Correspondence may also be addressed to M. G. Gustafsson. Tel: +46 18 4713229; Fax: +46 18 555096; Present address: M. G. Gustafsson, Department of Medical Sciences, Uppsala University, Uppsala University Hospital, SE-751 85 Uppsala, Sweden
| | | |
Collapse
|
21
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (Reference EFSA‐GMO‐UK‐2004‐05) for the placing on the market of insect‐protected and glufosinate and glyphosate‐tolerant genetically modified maize 1507 × NK603, for food and feed uses, and import and processing under Regulation (EC) No 1829/2003 from Pioneer Hi‐Bred and Mycogen Seeds. EFSA J 2006. [DOI: 10.2903/j.efsa.2006.355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
22
|
Soeria-Atmadja D, Wallman M, Björklund AK, Isaksson A, Hammerling U, Gustafsson MG. External cross-validation for unbiased evaluation of protein family detectors: application to allergens. Proteins 2006; 61:918-25. [PMID: 16231294 DOI: 10.1002/prot.20656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Key issues in protein science and computational biology are design and evaluation of algorithms aimed at detection of proteins that belong to a specific family, as defined by structural, evolutionary, or functional criteria. In this context, several validation techniques are often used to compare different parameter settings of the detector, and to subsequently select the setting that yields the smallest error rate estimate. A frequently overlooked problem associated with this approach is that this smallest error rate estimate may have a large optimistic bias. Based on computer simulations, we show that a detector's error rate estimate can be overly optimistic and propose a method to obtain unbiased performance estimates of a detector design procedure. The method is founded on an external 10-fold cross-validation (CV) loop that embeds an internal validation procedure used for parameter selection in detector design. The designed detector generated in each of the 10 iterations are evaluated on held-out examples exclusively available in the external CV iterations. Notably, the average of these 10 performance estimates is not associated with a final detector, but rather with the average performance of the design procedure used. We apply the external CV loop to the particular problem of detecting potentially allergenic proteins, using a previously reported design procedure. Unbiased performance estimates of the allergen detector design procedure are presented together with information about which algorithms and parameter settings that are most frequently selected.
Collapse
|
23
|
Cui J, Han LY, Li H, Ung CY, Tang ZQ, Zheng CJ, Cao ZW, Chen YZ. Computer prediction of allergen proteins from sequence-derived protein structural and physicochemical properties. Mol Immunol 2006; 44:514-20. [PMID: 16563508 DOI: 10.1016/j.molimm.2006.02.010] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2005] [Revised: 02/06/2006] [Accepted: 02/14/2006] [Indexed: 11/21/2022]
Abstract
BACKGROUND Computational methods have been developed for predicting allergen proteins from sequence segments that show identity, homology, or motif match to a known allergen. These methods achieve good prediction accuracies, but are less effective for novel proteins with no similarity to any known allergen. METHODS This work tests the feasibility of using a statistical learning method, support vector machines, as such a method. The prediction system is trained and tested by using 1005 allergen proteins from the Allergome database and 22,469 non-allergen proteins from 7871 Pfam families. RESULTS Testing results by an independent set of 229 allergen and 6717 non-allergen proteins from 7871 Pfam families show that 93.0% and 99.9% of these are correctly predicted, which are comparable to the best results of other methods. Of the 18 novel allergen proteins non-homologous to any other proteins in the Swissprot database, 88.9% is correctly predicted. A further screening of 168,128 proteins in the Swissprot database finds that 2.9% of the proteins are predicted as allergen proteins, which is consistent with the estimated numbers from motif-based methods. CONCLUSIONS Our study suggests that SVM is a potentially useful method for predicting allergen proteins and it has certain capability for predicting novel allergen proteins. Our software can be accessed at .
Collapse
Affiliation(s)
- Juan Cui
- Bioinformatics and Drug Design Group, Department of Pharmacy and Computational Science, National University of Singapore, Blk SoC 1, Level 7, 3 Science Drive 2, Singapore 117543, Singapore
| | | | | | | | | | | | | | | |
Collapse
|
24
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (Reference EFSA-GMO-UK-2004-01) for the placing on the market of glyphosate-tolerant and insect-resistant genetically modified maize NK603 × MON810, for food and fee. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.309] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
25
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] related to the notification (Reference C/GB/02/M3/3) for the placing on the market of glyphosate-tolerant and insect-resistant genetically modified maize NK603 × MON810, for import an. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.308] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
26
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (Reference EFSA GMO UK 2004 06) for the placing on the market of insect-protected glyphosate-tolerant genetically modified maize MON863 × NK603, for food and feed us. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.255] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
27
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] related to the Notification for the placing on the market of insect-protected genetically modified maize MON 863 × MON 810, for import and processing, under Part C of Directive 2001/1. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.251] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
28
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (Reference EFSA GMO BE 2004 07) for the placing on the market of insect-protected glyphosate-tolerant genetically modified maize MON863 × MON810 × NK603, for food an. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.256] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
29
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (Reference EFSA GMO DE 2004 03) for the placing on the market of insect protected genetically modified maize MON 863 × MON 810, for food and feed use, under Regulati. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.252] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
30
|
Spök A, Gaugitsch H, Laffer S, Pauli G, Saito H, Sampson H, Sibanda E, Thomas W, van Hage M, Valenta R. Suggestions for the Assessment of the Allergenic Potential of Genetically Modified Organisms. Int Arch Allergy Immunol 2005; 137:167-80. [PMID: 15947472 DOI: 10.1159/000086315] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2005] [Accepted: 04/11/2005] [Indexed: 11/19/2022] Open
Abstract
The prevalence of allergic diseases has been increasing continuously and, accordingly, there is a great desire to evaluate the allergenic potential of components in our daily environment (e.g., food). Although there is almost no scientific evidence that genetically modified organisms (GMOs) exhibit increased allergenicity compared with the corresponding wild type significant concerns have been raised regarding this matter. In principle, it is possible that the allergenic potential of GMOs may be increased due to the introduction of potential foreign allergens, to potentially upregulated expression of allergenic components caused by the modification of the wild type organism or to different means of exposure. According to the current practice, the proteins to be introduced into a GMO are evaluated for their physiochemical properties, sequence homology with known allergens and occasionally regarding their allergenic activity. We discuss why these current rules and procedures cannot predict or exclude the allergenicity of a given GMO with certainty. As an alternative we suggest to improve the current evaluation by an experimental comparison of the wild-type organism with the whole GMO regarding their potential to elicit reactions in allergic individuals and to induce de novo sensitizations. We also recommend that the suggested assessment procedures be equally applied to GMOs as well as to natural cultivars in order to establish effective measures for allergy prevention.
Collapse
Affiliation(s)
- Armin Spök
- Inter-University Research Centre for Technology, Work, and Culture, Graz, Austria
| | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Goodman RE, Hefle SL, Taylor SL, van Ree R. Assessing Genetically Modified Crops to Minimize the Risk of Increased Food Allergy: A Review. Int Arch Allergy Immunol 2005; 137:153-66. [PMID: 15947471 DOI: 10.1159/000086314] [Citation(s) in RCA: 78] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022] Open
Abstract
The first genetically modified (GM) crops approved for food use (tomato and soybean) were evaluated for safety by the United States Food and Drug Administration prior to commercial production. Among other factors, those products and all additional GM crops that have been grown commercially have been evaluated for potential increases in allergenic properties using methods that are consistent with the current understanding of food allergens and knowledge regarding the prediction of allergenic activity. Although there have been refinements, the key aspects of the evaluation have not changed. The allergenic properties of the gene donor and the host (recipient) organisms are considered in determining the appropriate testing strategy. The amino acid sequence of the encoded protein is compared to all known allergens to determine whether the protein is a known allergen or is sufficiently similar to any known allergen to indicate an increased probability of allergic cross-reactivity. Stability of the protein in the presence of acid with the stomach protease pepsin is tested as a risk factor for food allergenicity. In vitro or in vivo human IgE binding are tested when appropriate, if the gene donor is an allergen or the sequence of the protein is similar to an allergen. Serum donors and skin test subjects are selected based on their proven allergic responses to the gene donor or to material containing the allergen that was matched in sequence. While some scientists and regulators have suggested using animal models, performing broadly targeted serum IgE testing or extensive pre- or post-market clinical tests, current evidence does not support these tests as being predictive or practical. Based on the evidence to date, the current assessment process has worked well to prevent the unintended introduction of allergens in commercial GM crops.
Collapse
Affiliation(s)
- Richard E Goodman
- Food Allergy Research and Resource Program, University of Nebraska, Lincoln, NE 68583-0955, USA.
| | | | | | | |
Collapse
|
32
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] related to the notification (Reference C/ES/01/01) for the placing on the market of insect-tolerant genetically modified maize 1507 for import, feed and industrial processing and cult. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.181] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
33
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (reference EFSA-GMO-NL-2004-02) for the placing on the market of insect-tolerant genetically modified maize 1507, for food use, under Regulation (EC) No 1829/2003 fr. EFSA J 2005. [DOI: 10.2903/j.efsa.2005.182] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
34
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on a request from the Commission related to the notification (Reference C/NL/00/10) for the placing on the market of insect-tolerant genetically modified maize 1507, for import and pr. EFSA J 2004. [DOI: 10.2903/j.efsa.2004.124] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
35
|
Fiers MWEJ, Kleter GA, Nijland H, Peijnenburg AACM, Nap JP, van Ham RCHJ. Allermatch, a webtool for the prediction of potential allergenicity according to current FAO/WHO Codex alimentarius guidelines. BMC Bioinformatics 2004; 5:133. [PMID: 15373946 PMCID: PMC522748 DOI: 10.1186/1471-2105-5-133] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2004] [Accepted: 09/16/2004] [Indexed: 11/10/2022] Open
Abstract
Background Novel proteins entering the food chain, for example by genetic modification of plants, have to be tested for allergenicity. Allermatch™ is a webtool for the efficient and standardized prediction of potential allergenicity of proteins and peptides according to the current recommendations of the FAO/WHO Expert Consultation, as outlined in the Codex alimentarius. Description A query amino acid sequence is compared with all known allergenic proteins retrieved from the protein databases using a sliding window approach. This identifies stretches of 80 amino acids with more than 35% similarity or small identical stretches of at least six amino acids. The outcome of the analysis is presented in a concise format. The predictive performance of the FAO/WHO criteria is evaluated by screening sets of allergens and non-allergens against the Allermatch databases. Besides correct predictions, both methods are shown to generate false positive and false negative hits and the outcomes should therefore be combined with other methods of allergenicity assessment, as advised by the FAO/WHO. Conclusions Allermatch™ provides an accessible, efficient, and useful webtool for analysis of potential allergenicity of proteins introduced in genetically modified food prior to market release that complies with current FAO/WHO guidelines.
Collapse
Affiliation(s)
- Mark WEJ Fiers
- Applied Bioinformatics, Plant Research International, Wageningen University and Research Center, Wageningen, PO Box 16, 6700 AA, The Netherlands
| | - Gijs A Kleter
- RIKILT-Institute of Food Safety, Wageningen University and Research Center, Wageningen, PO Box 230, 6700 AE, The Netherlands
| | - Herman Nijland
- Applied Bioinformatics, Plant Research International, Wageningen University and Research Center, Wageningen, PO Box 16, 6700 AA, The Netherlands
| | - Ad ACM Peijnenburg
- RIKILT-Institute of Food Safety, Wageningen University and Research Center, Wageningen, PO Box 230, 6700 AE, The Netherlands
| | - Jan Peter Nap
- Applied Bioinformatics, Plant Research International, Wageningen University and Research Center, Wageningen, PO Box 16, 6700 AA, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, Plant Research International, Wageningen University and Research Center, Wageningen, PO Box 16, 6700 AA, The Netherlands
| |
Collapse
|
36
|
Björklund AK, Soeria-Atmadja D, Zorzet A, Hammerling U, Gustafsson MG. Supervised identification of allergen-representative peptides for in silico detection of potentially allergenic proteins. Bioinformatics 2004; 21:39-50. [PMID: 15319257 DOI: 10.1093/bioinformatics/bth477] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Identification of potentially allergenic proteins is needed for the safety assessment of genetically modified foods, certain pharmaceuticals and various other products on the consumer market. Current methods in bioinformatic allergology exploit common features among allergens for the detection of amino acid sequences of potentially allergenic proteins. Features for identification still unexplored include the motifs occurring commonly in allergens, but rarely in ordinary proteins. In this paper, we present an algorithm for the identification of such motifs with the purpose of biocomputational detection of amino acid sequences of potential allergens. RESULTS Identification of allergen-representative peptides (ARPs) with low or no occurrence in proteins lacking allergenic properties is the essential component of our new method, designated DASARP (Detection based on Automated Selection of Allergen-Representative Peptide). This approach consistently outperforms the criterion based on identical peptide match for predicting allergenicity recommended by ILSI/IFBC and FAO/WHO and shows results comparable to the alignment-based criterion as outlined by FAO/WHO. AVAILABILITY The detection software and the ARP set needed for the analysis of a query protein reported here are properties of the Swedish National Food Agency and are available upon request. The protein sequence sets used in this work are publicly available on http://www.slv.se/templatesSLV/SLV_Page____9343.asp. Allergenicity assessment for specific protein sequences of interest is also possible via ulfh@slv.se
Collapse
Affiliation(s)
- Asa K Björklund
- Division of Toxicology, National Food Administration, P.O. Box 622, SE-751 26 Uppsala, Sweden
| | | | | | | | | |
Collapse
|
37
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on a request from the Commission related to the Notification (Reference C/NL/98/11) for the placing on the market of herbicide-tolerant oilseed rape GT73, for import and processing, u. EFSA J 2004. [DOI: 10.2903/j.efsa.2004.29] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|