1
|
Sundararaj R, Mathimaran A, Prabhu D, Ramachandran B, Jeyaraman J, Muthupandian S, Asmelash T. In silico approaches for the identification of potential allergens among hypothetical proteins from Alternaria alternata and its functional annotation. Sci Rep 2024; 14:6696. [PMID: 38509156 PMCID: PMC10954717 DOI: 10.1038/s41598-024-55463-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2023] [Accepted: 02/23/2024] [Indexed: 03/22/2024] Open
Abstract
Direct exposure to the fungal species Alternaria alternata is a major risk factor for the development of asthma, allergic rhinitis, and inflammation. As of November 23rd 2020, the NCBI protein database showed 11,227 proteins from A. alternata genome as hypothetical proteins (HPs). Allergens are the main causative of several life-threatening diseases, especially in fungal infections. Therefore, the main aim of the study is to identify the potentially allergenic inducible proteins from the HPs in A. alternata and their associated functional assignment for the complete understanding of the complex biological systems at the molecular level. AlgPred and Structural Database of Allergenic Proteins (SDAP) were used for the prediction of potential allergens from the HPs of A. alternata. While analyzing the proteome data, 29 potential allergens were predicted by AlgPred and further screening in SDAP confirmed the allergic response of 10 proteins. Extensive bioinformatics tools including protein family classification, sequence-function relationship, protein motif discovery, pathway interactions, and intrinsic features from the amino acid sequence were used to successfully predict the probable functions of the 10 HPs. The functions of the HPs are characterized as chitin-binding, ribosomal protein P1, thaumatin, glycosyl hydrolase, and NOB1 proteins. The subcellular localization and signal peptide prediction of these 10 proteins has further provided additional information on localization and function. The allergens prediction and functional annotation of the 10 proteins may facilitate a better understanding of the allergenic mechanism of A. alternata in asthma and other diseases. The functional domain level insights and predicted structural features of the allergenic proteins help to understand the pathogenesis and host immune tolerance. The outcomes of the study would aid in the development of specific drugs to combat A. alternata infections.
Collapse
Affiliation(s)
- Rajamanikandan Sundararaj
- Department of Biochemistry, Centre for Drug Discovery, Karpagam Academy of Higher Education, Coimbatore, 641021, India
| | - Amala Mathimaran
- Structural Biology and Biocomputing Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630 004, India
| | - Dhamodharan Prabhu
- Department of Biotechnology, Centre for Drug Discovery, Karpagam Academy of Higher Education, Coimbatore, 641021, India
| | - Balajee Ramachandran
- Department of Pharmacology, Physiology & Biophysics, Chobanian & Avedisian School of Medicine, Boston University, 700 Albany Street, Boston, MA, 02118, USA
| | - Jeyakanthan Jeyaraman
- Structural Biology and Biocomputing Lab, Department of Bioinformatics, Alagappa University, Karaikudi, Tamil Nadu, 630 004, India
| | - Saravanan Muthupandian
- Department of Pharmacology, Saveetha Dental College, Saveetha Institute of Medical and Technical Sciences (SIMATS), Chennai, 600077, India
| | - Tsehaye Asmelash
- Department of Medical Microbiology and Immunology, College of Health Sciences, Mekelle University, Mekelle, Tigray, Ethiopia.
| |
Collapse
|
2
|
Sharma P, Malhotra L, Dhamija RK. Comprehensive amino acid composition analysis of seed storage proteins of cereals and legumes: identification and understanding of intrinsically disordered and allergenic peptides. J Biomol Struct Dyn 2024:1-13. [PMID: 38178552 DOI: 10.1080/07391102.2023.2300126] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2023] [Accepted: 12/18/2023] [Indexed: 01/06/2024]
Abstract
The seed storage proteins of cereal and legumes are the primary source of amino acids which are required for sustaining the nitrogen and carbon demands during germination and growth. Humans derive most of their dietary proteins from storage proteins in form of a wide variety of foods, for consumption. The amino acid content of most of these proteins is biased and the need for this biasness is not understood. The high abundance of proline, glutamine, and cysteine in cereals makes the gluten fraction viscoelastic. The cereal proteins have less charge and legume proteins have more charge on them. Their non-polar amino acid distribution has large variations. These characteristics are strongly responsible for the partial and complete unfolding of several domains of the storage proteins. Many of the storage proteins share a highly conserved structural feature within the cupin superfamily spread across all kingdoms of life. The intrinsically disordered viscoelastic proteins help in making dough which is vital for the quality of bread. Unfolded regions harbor more immunogenic sequences and cause food-related allergies and intolerance. We have discussed these properties in terms of comparison of cereal and legume storage protein sequences and allergy. Our study supports the findings that large disordered regions contain allergen-representative peptides. Interestingly, a high number of allergen-representative peptides were cleavable by digestive enzymes. Furthermore, unfolded storage proteins mimic microbial immunogens to induce a memory immune response. Results findings can be used to guide the understanding of immunological characteristics of storage proteins and may assist in treatment decisions for food allergy.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Pratibha Sharma
- Human Behaviour Department, Institute of Human Behaviour and Allied Sciences, New Delhi, India
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
| | - Lakshay Malhotra
- Department of Biophysics, All India Institute of Medical Sciences, New Delhi, India
- Department of Biochemistry, Sri Venkateswara College, University of Delhi, New Delhi, India
| | | |
Collapse
|
3
|
Yu XX, Liu MQ, Li XY, Zhang YH, Tao BJ. Qualitative and Quantitative Prediction of Food Allergen Epitopes Based on Machine Learning Combined with In Vitro Experimental Validation. Food Chem 2022; 405:134796. [DOI: 10.1016/j.foodchem.2022.134796] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/25/2022] [Accepted: 10/26/2022] [Indexed: 11/24/2022]
|
4
|
Yap PG, Gan CY. In vivo challenges of anti-diabetic peptide therapeutics: Gastrointestinal stability, toxicity and allergenicity. Trends Food Sci Technol 2020. [DOI: 10.1016/j.tifs.2020.09.005] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
5
|
Bappy SS, Sultana S, Adhikari J, Mahmud S, Khan MA, Kibria KMK, Rahman MM, Shibly AZ. Extensive immunoinformatics study for the prediction of novel peptide-based epitope vaccine with docking confirmation against envelope protein of Chikungunya virus: a computational biology approach. J Biomol Struct Dyn 2020; 39:1139-1154. [PMID: 32037968 DOI: 10.1080/07391102.2020.1726815] [Citation(s) in RCA: 20] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
Chikungunya virus (CHIKV) instigating Chikungunya fever is a global infective menace resulting in high fever, weakened joint-muscle pain, and brain inflammation. Inaccessibility and unavailability of effective drugs have led us to an uncertain arena when it comes to providing proper medical treatment to the affected people. In this study, authentic encroachment has been made concerning the peptide-based epitope vaccine designing against CHIKV. A Proteome-wide search was performed to locate a conserved portion among the accessible viral outer membrane proteins which showcase a remarkable immune response using specific immunoinformatics and docking simulation tools. Primarily, the most probable immunogenic envelope glycoproteins E1 and E2 were identified from the UniProt database depending on their antigenicity scores. Subsequently, we selected two distinctive sequences "SEDVYANTQLVLQRP" and "IMLLYPDHPTLLSYR" in both E1 and E2 glycoproteins respectively. These two sequences identified as the most potent T and B cell epitope-based peptides as they interacted with 6 and 7 HLA-I and 5 HLA-II molecules with an extremely low IC50 score that was verified by molecular docking. Moreover, the sequences possess no allergenicity and are certainly located outside the transmembrane region. In addition, the sequences exhibited 88.46% and 100.00% Conservancy, covering high population coverage of 89.49% to 94.74% and 60.51% to 88.87% respectively in endemic countries. The identified peptide SEDVYANTQLVLQRP and IMLLYPDHPTLLSYR can be utilized next for the development of peptide-based epitope vaccine contrary to CHIKV, so further documentations and experimentations like Antigen testing, Antigen production, Clinical trials are needed to prove the validity of it. Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Syed Shahariar Bappy
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Sorna Sultana
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Juthi Adhikari
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Shafi Mahmud
- Department of Genetic Engineering and Biotechnology, University of Rajshahi, Rajshahi, Bangladesh
| | - Md Arif Khan
- Department of Biotechnology and Genetic Engineering, University of Development Alternative, Dhaka, Bangladesh.,Bio-Bio-1 Research Foundation, Sangskriti Bikash Kendra Bhaban, Dhaka, Bangladesh
| | - K M Kaderi Kibria
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Md Masuder Rahman
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Abu Zaffar Shibly
- Department of Biotechnology and Genetic Engineering, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| |
Collapse
|
6
|
|
7
|
Distinguishing allergens from non-allergenic homologues using Physical-Chemical Property (PCP) motifs. Mol Immunol 2018; 99:1-8. [PMID: 29627609 DOI: 10.1016/j.molimm.2018.03.022] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Revised: 03/22/2018] [Accepted: 03/27/2018] [Indexed: 02/07/2023]
Abstract
Quantitative guidelines to distinguish allergenic proteins from related, but non-allergenic ones are urgently needed for regulatory agencies, biotech companies and physicians. In a previous study, we found that allergenic proteins populate a relatively small number of protein families, as characterized by the Pfam database. However, these families also contain non-allergenic proteins, meaning that allergenic determinants must lie within more discrete regions of the sequence. Thus, new methods are needed to discriminate allergenic proteins within those families. Physical-Chemical Properties (PCP)-motifs specific for allergens within a Pfam class were determined for 17 highly populated protein domains. A novel scoring method based on PCP-motifs that characterize known allergenic proteins within these families was developed, and validated for those domains. The motif scores distinguished sequences of allergens from a large selection of 80,000 randomly selected non-allergenic sequences. The motif scores for the birch pollen allergen (Bet v 1) family, which also contains related fruit and nut allergens, correlated better than global sequence similarities with clinically observed cross-reactivities among those allergens. Further, we demonstrated that the average scores of allergen specific motifs for allergenic profilins are significantly different from the scores of non-allergenic profilins. Several of the selective motifs coincide with experimentally determined IgE epitopes of allergenic profilins. The motifs also discriminated allergenic pectate lyases, including Jun a 1 from mountain cedar pollen, from similar proteins in the human microbiome, which can be assumed to be non-allergens. The latter lacked key motifs characteristic of the known allergens, some of which correlate with known IgE binding sites.
Collapse
|
8
|
Rathinam M, Singh S, Pattanayak D, Sreevathsa R. Comprehensive in silico allergenicity assessment of novel protein engineered chimeric Cry proteins for safe deployment in crops. BMC Biotechnol 2017; 17:64. [PMID: 28768539 PMCID: PMC5541426 DOI: 10.1186/s12896-017-0384-z] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Accepted: 07/23/2017] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Development of chimeric Cry toxins by protein engineering of known and validated proteins is imperative for enhancing the efficacy and broadening the insecticidal spectrum of these genes. Expression of novel Cry proteins in food crops has however created apprehensions with respect to the safety aspects. To clarify this, premarket evaluation consisting of an array of analyses to evaluate the unintended effects is a prerequisite to provide safety assurance to the consumers. Additionally, series of bioinformatic tools as in silico aids are being used to evaluate the likely allergenic reaction of the proteins based on sequence and epitope similarity with known allergens. RESULTS In the present study, chimeric Cry toxins developed through protein engineering were evaluated for allergenic potential using various in silico algorithms. Major emphasis was on the validation of allergenic potential on three aspects of paramount significance viz., sequence-based homology between allergenic proteins, validation of conformational epitopes towards identification of food allergens and physico-chemical properties of amino acids. Additionally, in vitro analysis pertaining to heat stability of two of the eight chimeric proteins and pepsin digestibility further demonstrated the non-allergenic potential of these chimeric toxins. CONCLUSIONS The study revealed for the first time an all-encompassing evaluation that the recombinant Cry proteins did not show any potential similarity with any known allergens with respect to the parameters generally considered for a protein to be designated as an allergen. These novel chimeric proteins hence can be considered safe to be introgressed into plants.
Collapse
Affiliation(s)
- Maniraj Rathinam
- ICAR-National Research Centre on Plant Biotechnology, LBS Centre, Pusa Campus, New Delhi, 110012, India
| | - Shweta Singh
- ICAR-National Research Centre on Plant Biotechnology, LBS Centre, Pusa Campus, New Delhi, 110012, India
| | - Debasis Pattanayak
- ICAR-National Research Centre on Plant Biotechnology, LBS Centre, Pusa Campus, New Delhi, 110012, India
| | - Rohini Sreevathsa
- ICAR-National Research Centre on Plant Biotechnology, LBS Centre, Pusa Campus, New Delhi, 110012, India.
| |
Collapse
|
9
|
Multilevel ensemble model for prediction of IgA and IgG antibodies. Immunol Lett 2017; 184:51-60. [DOI: 10.1016/j.imlet.2017.01.017] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2016] [Revised: 01/30/2017] [Accepted: 01/30/2017] [Indexed: 01/04/2023]
|
10
|
Chaudhuri R, Ramachandran S. Immunoinformatics as a Tool for New Antifungal Vaccines. Methods Mol Biol 2017; 1625:31-43. [PMID: 28584981 DOI: 10.1007/978-1-4939-7104-6_3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Immunoinformatics aids in screening for vaccine candidates, which can be experimentally tested for their efficacy. This chapter describes methods to use immunoinformatics to screen fungal vaccines candidates. Surface-localized molecules called adhesins could elicit immune response and serve as efficient vaccine candidates. The screening process is patterned on two steps, namely, a First Layer screen mostly used for value addition and prioritization based on characteristics of known antigens and a Second Layer highly focussed on core immunoinformatics analysis involving the binding and interactions of the molecules of the immune system. Together they offer a comprehensive objective evaluation of vaccine candidates selection in silico for fungal pathogens.
Collapse
Affiliation(s)
| | - Srinivasan Ramachandran
- CSIR-Institute of Genomics and Integrative Biology, Room 130, Mathura Road, Near Sukhdev Vihar DTC Bus Depot, New Delhi, 110 025, India.
| |
Collapse
|
11
|
Abstract
The rapidly increasing number of characterized allergens has created huge demands for advanced information storage, retrieval, and analysis. Bioinformatics and machine learning approaches provide useful tools for the study of allergens and epitopes prediction, which greatly complement traditional laboratory techniques. The specific applications mainly include identification of B- and T-cell epitopes, and assessment of allergenicity and cross-reactivity. In order to facilitate the work of clinical and basic researchers who are not familiar with bioinformatics, we review in this chapter the most important databases, bioinformatic tools, and methods with relevance to the study of allergens.
Collapse
|
12
|
Chrysostomou C, Seker H. Prediction of protein allergenicity based on signal-processing bioinformatics approach. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2016; 2014:808-11. [PMID: 25570082 DOI: 10.1109/embc.2014.6943714] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Current bioinformatics tools accomplish high accuracies in classifying allergenic protein sequences with high homology and generally perform poorly with low homology protein sequences. Although some homologous regions explained Immunoglobulin E (IgE) cross-reactivity in groups of allergens, no universal molecular structure could be associated with allergenicity. In addition, studies have showed that cross-reactivity is not directly linked to the homology between protein sequences. Therefore, a new homology independent method needs to be developed to determine if a protein is an allergen or not. The aim of this study is therefore to differentiate sets of allergenic and non-allergenic proteins using a signal-processing based bioinformatics approach. In this paper, a new method was proposed for characterisation and classification of allergenic protein sequences. For this method hydrophobicity amino acid index was used to encode proteins to numerical sequences and Discrete Fourier Transform to extract features for each protein. Finally, a classifier was constructed based on Support Vector Machines. In order to demonstrate the applicability of the proposed method 857 allergen and 1000 non-allergen proteins were collected from UniProt online database. The results obtained from the proposed method yielded: MCC: 0.752 ± 0.007, Specificity: 0.912 ± 0.005, Sensitivity: 0.835 ± 0.008 and Total Accuracy: 87.65% ± 0.004.
Collapse
|
13
|
Garino C, Coïsson JD, Arlorio M. In silico allergenicity prediction of several lipid transfer proteins. Comput Biol Chem 2015; 60:32-42. [PMID: 26643760 DOI: 10.1016/j.compbiolchem.2015.11.006] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2015] [Revised: 11/04/2015] [Accepted: 11/10/2015] [Indexed: 10/22/2022]
Abstract
Non-specific lipid transfer proteins (nsLTPs) are common allergens and they are particularly widespread within the plant kingdom. They have a highly conserved three-dimensional structure that generate a strong cross-reactivity among the members of this family. In the last years several web tools for the prediction of allergenicity of new molecules based on their homology with known allergens have been released, and guidelines to assess potential allergenicity of proteins through bioinformatics have been established. Even if such tools are only partially reliable yet, they can provide important indications when other kinds of molecular characterization are lacking. The potential allergenicity of 28 amino acid sequences of LTPs homologs, either retrieved from the UniProt database or in silico deduced from the corresponding EST coding sequence, was predicted using 7 publicly available web tools. Moreover, their similarity degree to their closest known LTP allergens was calculated, in order to evaluate their potential cross-reactivity. Finally, all sequences were studied for their identity degree with the peach allergen Pru p 3, considering the regions involved in the formation of its known conformational IgE-binding epitope. Most of the analyzed sequences displayed a high probability to be allergenic according to all the software employed. The analyzed LTPs from bell pepper, cassava, mango, mungbean and soybean showed high homology (>70%) with some known allergenic LTPs, suggesting a potential risk of cross-reactivity for sensitized individuals. Other LTPs, like for example those from canola, cassava, mango, mungbean, papaya or persimmon, displayed a high degree of identity with Pru p 3 within the consensus sequence responsible for the formation, at three-dimensional level, of its major conformational epitope. Since recent studies highlighted how in patients mono-sensitized to peach LTP the levels of IgE seem directly proportional to the chance of developing cross-reactivity to LTPs from non-Rosaceae foods, and these chances increase the more similar the protein is to Pru p 3, these proteins should be taken into special account for future studies aimed at evaluating the risk of cross-allergenicity in highly sensitized individuals.
Collapse
Affiliation(s)
- Cristiano Garino
- Dipartimento di Scienze del Farmaco & Drug and Food Biotechnology (DFB) Center, Università del Piemonte Orientale "A. Avogadro", largo Donegani 2, 28100 Novara, Italy.
| | - Jean Daniel Coïsson
- Dipartimento di Scienze del Farmaco & Drug and Food Biotechnology (DFB) Center, Università del Piemonte Orientale "A. Avogadro", largo Donegani 2, 28100 Novara, Italy.
| | - Marco Arlorio
- Dipartimento di Scienze del Farmaco & Drug and Food Biotechnology (DFB) Center, Università del Piemonte Orientale "A. Avogadro", largo Donegani 2, 28100 Novara, Italy.
| |
Collapse
|
14
|
In Silico Sub-unit Hexavalent Peptide Vaccine Against an Staphylococcus aureus Biofilm-Related Infection. Int J Pept Res Ther 2015. [DOI: 10.1007/s10989-015-9489-1] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
|
15
|
Hayes M, Rougé P, Barre A, Herouet-Guicheney C, Roggen EL. In silico tools for exploring potential human allergy to proteins. ACTA ACUST UNITED AC 2015. [DOI: 10.1016/j.ddmod.2016.06.001] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
16
|
Saravanan V, Lakshmi PTV. Fuzzy Logic for Personalized Healthcare and Diagnostics: FuzzyApp—A Fuzzy Logic Based Allergen-Protein Predictor. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2014; 18:570-81. [DOI: 10.1089/omi.2014.0021] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Vijayakumar Saravanan
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India
| | - PTV Lakshmi
- Centre for Bioinformatics, School of Life Sciences, Pondicherry University, Pondicherry, India
| |
Collapse
|
17
|
Abstract
A large volume of data relevant to immunology research has accumulated due to sequencing of genomes of the human and other model organisms. At the same time, huge amounts of clinical and epidemiologic data are being deposited in various scientific literature and clinical records. This accumulation of the information is like a goldmine for researchers looking for mechanisms of immune function and disease pathogenesis. Thus the need to handle this rapidly growing immunological resource has given rise to the field known as immunoinformatics. Immunoinformatics, otherwise known as computational immunology, is the interface between computer science and experimental immunology. It represents the use of computational methods and resources for the understanding of immunological information. It not only helps in dealing with huge amount of data but also plays a great role in defining new hypotheses related to immune responses. This chapter reviews classical immunology, different databases, and prediction tool. Further, it briefly describes applications of immunoinformatics in reverse vaccinology, immune system modeling, and cancer diagnosis and therapy. It also explores the idea of integrating immunoinformatics with systems biology for the development of personalized medicine. All these efforts save time and cost to a great extent.
Collapse
Affiliation(s)
- Namrata Tomar
- Machine Intelligence Unit, Indian Statistical Institute, 203 B.T. Road, Kolkata, 700108, India,
| | | |
Collapse
|
18
|
PREAL: prediction of allergenic protein by maximum Relevance Minimum Redundancy (mRMR) feature selection. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 5:S9. [PMID: 24565053 PMCID: PMC4029432 DOI: 10.1186/1752-0509-7-s5-s9] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
BACKGROUND Assessment of potential allergenicity of protein is necessary whenever transgenic proteins are introduced into the food chain. Bioinformatics approaches in allergen prediction have evolved appreciably in recent years to increase sophistication and performance. However, what are the critical features for protein's allergenicity have been not fully investigated yet. RESULTS We presented a more comprehensive model in 128 features space for allergenic proteins prediction by integrating various properties of proteins, such as biochemical and physicochemical properties, sequential features and subcellular locations. The overall accuracy in the cross-validation reached 93.42% to 100% with our new method. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) procedure were applied to obtain which features are essential for allergenicity. Results of the performance comparisons showed the superior of our method to the existing methods used widely. More importantly, it was observed that the features of subcellular locations and amino acid composition played major roles in determining the allergenicity of proteins, particularly extracellular/cell surface and vacuole of the subcellular locations for wheat and soybean. To facilitate the allergen prediction, we implemented our computational method in a web application, which can be available at http://gmobl.sjtu.edu.cn/PREAL/index.php. CONCLUSIONS Our new approach could improve the accuracy of allergen prediction. And the findings may provide novel insights for the mechanism of allergies.
Collapse
|
19
|
Dimitrov I, Naneva L, Doytchinova I, Bangov I. AllergenFP: allergenicity prediction by descriptor fingerprints. Bioinformatics 2013; 30:846-51. [PMID: 24167156 DOI: 10.1093/bioinformatics/btt619] [Citation(s) in RCA: 402] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Allergenicity, like antigenicity and immunogenicity, is a property encoded linearly and non-linearly, and therefore the alignment-based approaches are not able to identify this property unambiguously. A novel alignment-free descriptor-based fingerprint approach is presented here and applied to identify allergens and non-allergens. The approach was implemented into a four step algorithm. Initially, the protein sequences are described by amino acid principal properties as hydrophobicity, size, relative abundance, helix and β-strand forming propensities. Then, the generated strings of different length are converted into vectors with equal length by auto- and cross-covariance (ACC). The vectors were transformed into binary fingerprints and compared in terms of Tanimoto coefficient. RESULTS The approach was applied to a set of 2427 known allergens and 2427 non-allergens and identified correctly 88% of them with Matthews correlation coefficient of 0.759. The descriptor fingerprint approach presented here is universal. It could be applied for any classification problem in computational biology. The set of E-descriptors is able to capture the main structural and physicochemical properties of amino acids building the proteins. The ACC transformation overcomes the main problem in the alignment-based comparative studies arising from the different length of the aligned protein sequences. The conversion of protein ACC values into binary descriptor fingerprints allows similarity search and classification. AVAILABILITY AND IMPLEMENTATION The algorithm described in the present study was implemented in a specially designed Web site, named AllergenFP (FP stands for FingerPrint). AllergenFP is written in Python, with GIU in HTML. It is freely accessible at http://ddg-pharmfac.net/Allergen FP. CONTACT idoytchinova@pharmfac.net or ivanbangov@shu-bg.net.
Collapse
Affiliation(s)
- Ivan Dimitrov
- Medical University of Sofia, Faculty of Pharmacy, 2 Dunav st., 1000 Sofia and Konstantin Preslavski Shumen University, Faculty of Natural Sciences, 115 Universitetska st., 9712 Shumen, Bulgaria
| | | | | | | |
Collapse
|
20
|
Dimitrov I, Flower DR, Doytchinova I. AllerTOP--a server for in silico prediction of allergens. BMC Bioinformatics 2013; 14 Suppl 6:S4. [PMID: 23735058 PMCID: PMC3633022 DOI: 10.1186/1471-2105-14-s6-s4] [Citation(s) in RCA: 243] [Impact Index Per Article: 22.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background Allergy is a form of hypersensitivity to normally innocuous substances, such as dust, pollen, foods or drugs. Allergens are small antigens that commonly provoke an IgE antibody response. There are two types of bioinformatics-based allergen prediction. The first approach follows FAO/WHO Codex alimentarius guidelines and searches for sequence similarity. The second approach is based on identifying conserved allergenicity-related linear motifs. Both approaches assume that allergenicity is a linearly coded property. In the present study, we applied ACC pre-processing to sets of known allergens, developing alignment-independent models for allergen recognition based on the main chemical properties of amino acid sequences. Results A set of 684 food, 1,156 inhalant and 555 toxin allergens was collected from several databases. A set of non-allergens from the same species were selected to mirror the allergen set. The amino acids in the protein sequences were described by three z-descriptors (z1, z2 and z3) and by auto- and cross-covariance (ACC) transformation were converted into uniform vectors. Each protein was presented as a vector of 45 variables. Five machine learning methods for classification were applied in the study to derive models for allergen prediction. The methods were: discriminant analysis by partial least squares (DA-PLS), logistic regression (LR), decision tree (DT), naïve Bayes (NB) and k nearest neighbours (kNN). The best performing model was derived by kNN at k = 3. It was optimized, cross-validated and implemented in a server named AllerTOP, freely accessible at http://www.pharmfac.net/allertop. AllerTOP also predicts the most probable route of exposure. In comparison to other servers for allergen prediction, AllerTOP outperforms them with 94% sensitivity. Conclusions AllerTOP is the first alignment-free server for in silico prediction of allergens based on the main physicochemical properties of proteins. Significantly, as well allergenicity AllerTOP is able to predict the route of allergen exposure: food, inhalant or toxin.
Collapse
Affiliation(s)
- Ivan Dimitrov
- Faculty of Pharmacy, Medical University of Sofia, 2 Dunav st,, Sofia, Bulgaria
| | | | | |
Collapse
|
21
|
Xue B, Soeria-Atmadja D, Gustafsson MG, Hammerling U, Dunker AK, Uversky VN. Abundance and functional roles of intrinsic disorder in allergenic proteins and allergen representative peptides. Proteins 2011; 79:2595-606. [DOI: 10.1002/prot.23077] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2011] [Revised: 04/14/2011] [Accepted: 05/04/2011] [Indexed: 01/23/2023]
|
22
|
Bioinformatic analysis for allergenicity assessment of Bacillus thuringiensis Cry proteins expressed in insect-resistant food crops. Food Chem Toxicol 2011; 49:356-62. [DOI: 10.1016/j.fct.2010.11.008] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2010] [Revised: 11/02/2010] [Accepted: 11/05/2010] [Indexed: 01/17/2023]
|
23
|
Verma AK, Misra A, Subash S, Das M, Dwivedi PD. Computational allergenicity prediction of transgenic proteins expressed in genetically modified crops. Immunopharmacol Immunotoxicol 2010; 33:410-22. [PMID: 20964517 DOI: 10.3109/08923973.2010.523704] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Development of genetically modified (GM) crops is on increase to improve food quality, increase harvest yields, and reduce the dependency on chemical pesticides. Before their release in marketplace, they should be scrutinized for their safety. Several guidelines of different regulatory agencies like ILSI, WHO Codex, OECD, and so on for allergenicity evaluation of transgenics are available and sequence homology analysis is the first test to determine the allergenic potential of inserted proteins. Therefore, to test and validate, 312 allergenic, 100 non-allergenic, and 48 inserted proteins were assessed for sequence similarity using 8-mer, 80-mer, and full FASTA search. On performing sequence homology studies, ~94% the allergenic proteins gave exact matches for 8-mer and 80-mer homology. However, 20 allergenic proteins showed non-allergenic behavior. Out of 100 non-allergenic proteins, seven qualified as allergens. None of the inserted proteins demonstrated allergenic behavior. In order to improve the predictability, proteins showing anomalous behavior were tested by Algpred and ADFS separately. Use of Algpred and ADFS softwares reduced the tendency of false prediction to a great extent (74-78%). In conclusion, routine sequence homology needs to be coupled with some other bioinformatic method like ADFS/Algpred to reduce false allergenicity prediction of novel proteins.
Collapse
Affiliation(s)
- Alok Kumar Verma
- Food Toxicology Division, Indian Institute of Toxicology Research, Council of Scientific and Industrial Research, Lucknow, Uttar Pradesh, India
| | | | | | | | | |
Collapse
|
24
|
Tomar N, De RK. Immunoinformatics: an integrated scenario. Immunology 2010; 131:153-68. [PMID: 20722763 PMCID: PMC2967261 DOI: 10.1111/j.1365-2567.2010.03330.x] [Citation(s) in RCA: 98] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2009] [Revised: 06/12/2010] [Accepted: 06/21/2010] [Indexed: 12/11/2022] Open
Abstract
Genome sequencing of humans and other organisms has led to the accumulation of huge amounts of data, which include immunologically relevant data. A large volume of clinical data has been deposited in several immunological databases and as a result immunoinformatics has emerged as an important field which acts as an intersection between experimental immunology and computational approaches. It not only helps in dealing with the huge amount of data but also plays a role in defining new hypotheses related to immune responses. This article reviews classical immunology, different databases and prediction tools. It also describes applications of immunoinformatics in designing in silico vaccination and immune system modelling. All these efforts save time and reduce cost.
Collapse
Affiliation(s)
- Namrata Tomar
- Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
| | | |
Collapse
|
25
|
Scientific Opinion on the assessment of allergenicity of GM plants and microorganisms and derived food and feed. EFSA J 2010. [DOI: 10.2903/j.efsa.2010.1700] [Citation(s) in RCA: 243] [Impact Index Per Article: 17.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
|
26
|
Sharma R, Singh AK, Umashankar V. Characterization of allergenic epitopes of Ory s1 protein from Oryza sativa and its homologs. Bioinformation 2009; 4:12-8. [PMID: 20011147 PMCID: PMC2770365 DOI: 10.6026/97320630004012] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 05/04/2009] [Accepted: 06/17/2009] [Indexed: 11/23/2022] Open
Abstract
Vaccination is the most effective technique suggested now days for allergy treatment. Recombinant-based approaches are mostly focused on genetic modification of allergens to produce molecules with reduced allergenic activity and conserved antigenicity. The molecules developed for vaccination in allergy possess significantly reduced allergenicity in terms of IgE binding, and therefore will not lead to anaphylactic reactions upon injection. This approach is probably feasible with every peptide allergen with known amino acid sequence. In this study an in silico approach was used to investigate allergenic protein sequences. Motif analysis of these sequences reveals the allergenic epitopes in the amino acid sequences. Physicochemical analysis of protein sequences shows that the homolog allergens of Ory s1 are highly correlated with the aromaticity, GRAVY and cysteine content. Moreover, phylogenetic analysis of Ory s1 with other sequences reveals that Oryza sativa japonica and Zea mays are close homologs, whilst Lolium perenne and Dactylis glomerata are found to be remote homologs. The multiple sequence alignment reveals of Ory s1 with all its homologs in this study reveals the high conservation of residues in DPBB_1 domain (amino acid residue positions 86- 164) and was found distinctly in all the sequences. These findings support the proposal that allergenic epitopes encompass conserved residues. The consensus allergenic was found to be mainly composed of hydrophobic residues. The functional sites of allergenic proteins reported in this study shall be attenuated to develop hypoallergenic vaccine. The sequence comparison strategy adopted in this study would pave way effective evolutionary analysis of these allergens.
Collapse
Affiliation(s)
- Ruchi Sharma
- Department of Botany, Udaya Pratap College, Varanasi, Uttar Pradesh, India
| | - Ashok Kumar Singh
- Department of Botany, Udaya Pratap College, Varanasi, Uttar Pradesh, India
| | - Vetrivel Umashankar
- Department of Bioinformatics, School of Biosciences, SRM University, Ramapuram, Chennai, Tamil Nadu, India
| |
Collapse
|
27
|
Bimber BN, Burwitz BJ, O'Connor S, Detmer A, Gostick E, Lank SM, Price DA, Hughes A, O'Connor D. Ultradeep pyrosequencing detects complex patterns of CD8+ T-lymphocyte escape in simian immunodeficiency virus-infected macaques. J Virol 2009; 83:8247-53. [PMID: 19515775 PMCID: PMC2715741 DOI: 10.1128/jvi.00897-09] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2009] [Accepted: 06/01/2009] [Indexed: 11/20/2022] Open
Abstract
Human and simian immunodeficiency viruses (HIV/SIV) exhibit enormous sequence heterogeneity within each infected host. Here, we use ultradeep pyrosequencing to create a comprehensive picture of CD8(+) T-lymphocyte (CD8-TL) escape in SIV-infected macaques, revealing a previously undetected complex pattern of viral variants. This increased sensitivity enabled the detection of acute CD8-TL escape as early as 17 days postinfection, representing the earliest published example of CD8-TL escape in intrarectally infected macaques. These data demonstrate that pyrosequencing can be used to study the evolution of CD8-TL escape during immunodeficiency virus infection with an unprecedented degree of sensitivity.
Collapse
Affiliation(s)
- Benjamin N Bimber
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, 53706, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
28
|
Hammerling U, Tallsjö A, Grafström R, Ilbäck NG. Comparative Hazard Characterization in Food Toxicology. Crit Rev Food Sci Nutr 2009; 49:626-69. [DOI: 10.1080/10408390802145617] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023]
|
29
|
Muh HC, Tong JC, Tammi MT. AllerHunter: a SVM-pairwise system for assessment of allergenicity and allergic cross-reactivity in proteins. PLoS One 2009; 4:e5861. [PMID: 19516900 PMCID: PMC2689655 DOI: 10.1371/journal.pone.0005861] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2008] [Accepted: 05/06/2009] [Indexed: 11/19/2022] Open
Abstract
Allergy is a major health problem in industrialized countries. The number of transgenic food crops is growing rapidly creating the need for allergenicity assessment before they are introduced into human food chain. While existing bioinformatic methods have achieved good accuracies for highly conserved sequences, the discrimination of allergens and non-allergens from allergen-like non-allergen sequences remains difficult. We describe AllerHunter, a web-based computational system for the assessment of potential allergenicity and allergic cross-reactivity in proteins. It combines an iterative pairwise sequence similarity encoding scheme with SVM as the discriminating engine. The pairwise vectorization framework allows the system to model essential features in allergens that are involved in cross-reactivity, but not limited to distinct sets of physicochemical properties. The system was rigorously trained and tested using 1,356 known allergen and 13,449 putative non-allergen sequences. Extensive testing was performed for validation of the prediction models. The system is effective for distinguishing allergens and non-allergens from allergen-like non-allergen sequences. Testing results showed that AllerHunter, with a sensitivity of 83.4% and specificity of 96.4% (accuracy = 95.3%, area under the receiver operating characteristic curve AROC = 0.928+/-0.004 and Matthew's correlation coefficient MCC = 0.738), performs significantly better than a number of existing methods using an independent dataset of 1443 protein sequences. AllerHunter is available at (http://tiger.dbs.nus.edu.sg/AllerHunter).
Collapse
Affiliation(s)
- Hon Cheng Muh
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
| | - Joo Chuan Tong
- Data Mining Department, Institute for Infocomm Research, Singapore, Singapore
- Department of Biochemistry, National University of Singapore, Singapore, Singapore
| | - Martti T. Tammi
- Department of Biological Sciences, National University of Singapore, Singapore, Singapore
- Department of Biochemistry, National University of Singapore, Singapore, Singapore
- Department of Microbiology, Tumor and Cell Biology, Karolinska Institutet, Stockholm, Sweden
| |
Collapse
|
30
|
Lim SJ, Tong JC, Chew FT, Tammi MT. The value of position-specific scoring matrices for assessment of protein allegenicity. BMC Bioinformatics 2008; 9 Suppl 12:S21. [PMID: 19091021 PMCID: PMC2638161 DOI: 10.1186/1471-2105-9-s12-s21] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Bioinformatics tools are commonly used for assessing potential protein allergenicity. While these methods have achieved good accuracies for highly conserved sequences, they are less effective when the overall similarity is low. In this study, we assessed the feasibility of using position-specific scoring matrices as a basis for predicting potential allergenicity in proteins. RESULTS Two simple methods for predicting potential allergenicity in proteins, based on general and group-specific allergen profiles, are presented. Testing results indicate that the performances of both methods are comparable to the best results of other methods. The group-specific profile approach, with a sensitivity of 84.04% and specificity of 96.52%, gives similar results as those obtained using the general profile approach (sensitivity = 82.45%, specificity = 96.92%). CONCLUSION We show that position-specific scoring matrices are highly promising for constructing computational models suitable for allergenicity assessment. These data suggest it may be possible to apply a targeted approach for allergenicity assessment based on the profiles of allergens of interest.
Collapse
Affiliation(s)
- Shen Jean Lim
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 8 Medical Drive, Singapore 117597.
| | | | | | | |
Collapse
|
31
|
Request from the European Commission related to the safeguard clause invoked by Austria on maize MON810 and T25 according to Article 23 of Directive 2001/18/EC. EFSA J 2008. [DOI: 10.2903/j.efsa.2008.891] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
32
|
Characteristic motifs for families of allergenic proteins. Mol Immunol 2008; 46:559-68. [PMID: 18951633 DOI: 10.1016/j.molimm.2008.07.034] [Citation(s) in RCA: 60] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2008] [Revised: 07/22/2008] [Accepted: 07/23/2008] [Indexed: 12/16/2022]
Abstract
The identification of potential allergenic proteins is usually done by scanning a database of allergenic proteins and locating known allergens with a high sequence similarity. However, there is no universally accepted cut-off value for sequence similarity to indicate potential IgE cross-reactivity. Further, overall sequence similarity may be less important than discrete areas of similarity in proteins with homologous structure. To identify such areas, we first classified all allergens and their subdomains in the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to their closest protein families as defined in Pfam, and identified conserved physicochemical property motifs characteristic of each group of sequences. Allergens populate only a small subset of all known Pfam families, as all allergenic proteins in SDAP could be grouped to only 130 (of 9318 total) Pfams, and 31 families contain more than four allergens. Conserved physicochemical property motifs for the aligned sequences of the most populated Pfam families were identified with the PCPMer program suite and catalogued in the webserver MotifMate (http://born.utmb.edu/motifmate/summary.php). We also determined specific motifs for allergenic members of a family that could distinguish them from non-allergenic ones. These allergen specific motifs should be most useful in database searches for potential allergens. We found that sequence motifs unique to the allergens in three families (seed storage proteins, Bet v 1, and tropomyosin) overlap with known IgE epitopes, thus providing evidence that our motif based approach can be used to assess the potential allergenicity of novel proteins.
Collapse
|
33
|
El-Manzalawy Y, Dobbs D, Honavar V. Predicting linear B-cell epitopes using string kernels. J Mol Recognit 2008; 21:243-55. [PMID: 18496882 PMCID: PMC2683948 DOI: 10.1002/jmr.893] [Citation(s) in RCA: 507] [Impact Index Per Article: 31.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The identification and characterization of B‐cell epitopes play an important role in vaccine design, immunodiagnostic tests, and antibody production. Therefore, computational tools for reliably predicting linear B‐cell epitopes are highly desirable. We evaluated Support Vector Machine (SVM) classifiers trained utilizing five different kernel methods using fivefold cross‐validation on a homology‐reduced data set of 701 linear B‐cell epitopes, extracted from Bcipep database, and 701 non‐epitopes, randomly extracted from SwissProt sequences. Based on the results of our computational experiments, we propose BCPred, a novel method for predicting linear B‐cell epitopes using the subsequence kernel. We show that the predictive performance of BCPred (AUC = 0.758) outperforms 11 SVM‐based classifiers developed and evaluated in our experiments as well as our implementation of AAP (AUC = 0.7), a recently proposed method for predicting linear B‐cell epitopes using amino acid pair antigenicity. Furthermore, we compared BCPred with AAP and ABCPred, a method that uses recurrent neural networks, using two data sets of unique B‐cell epitopes that had been previously used to evaluate ABCPred. Analysis of the data sets used and the results of this comparison show that conclusions about the relative performance of different B‐cell epitope prediction methods drawn on the basis of experiments using data sets of unique B‐cell epitopes are likely to yield overly optimistic estimates of performance of evaluated methods. This argues for the use of carefully homology‐reduced data sets in comparing B‐cell epitope prediction methods to avoid misleading conclusions about how different methods compare to each other. Our homology‐reduced data set and implementations of BCPred as well as the APP method are publicly available through our web‐based server, BCPREDS, at: http://ailab.cs.iastate.edu/bcpreds/. Copyright © 2008 John Wiley & Sons, Ltd.
Collapse
Affiliation(s)
- Yasser El-Manzalawy
- Artificial Intelligence Laboratory, Iowa State University, Ames, IA 50010, USA.
| | | | | |
Collapse
|
34
|
Chao E, Krewski D. A risk-based classification scheme for genetically modified foods. I: Conceptual development. Regul Toxicol Pharmacol 2008; 52:208-22. [PMID: 18778747 DOI: 10.1016/j.yrtph.2008.08.006] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2008] [Revised: 07/22/2008] [Accepted: 08/13/2008] [Indexed: 11/19/2022]
Abstract
The predominant paradigm for the premarket assessment of genetically modified (GM) foods reflects heightened public concern by focusing on foods modified by recombinant deoxyribonucleic acid (rDNA) techniques, while foods modified by other methods of genetic modification are generally not assessed for safety. To determine whether a GM product requires less or more regulatory oversight and testing, we developed and evaluated a risk-based classification scheme (RBCS) for crop-derived GM foods. The results of this research are presented in three papers. This paper describes the conceptual development of the proposed RBCS that focuses on two categories of adverse health effects: (1) toxic and antinutritional effects, and (2) allergenic effects. The factors that may affect the level of potential health risks of GM foods are identified. For each factor identified, criteria for differentiating health risk potential are developed. The extent to which a GM food satisfies applicable criteria for each factor is rated separately. A concern level for each category of health effects is then determined by aggregating the ratings for the factors using predetermined aggregation rules. An overview of the proposed scheme is presented, as well as the application of the scheme to a hypothetical GM food.
Collapse
Affiliation(s)
- Eunice Chao
- McLaughlin Centre for Population Health Risk Assessment, Institute of Population Health, University of Ottawa, 1 Stewart Street, Ottawa, Ont., Canada KIN 6N5.
| | | |
Collapse
|
35
|
Darewicz M, Dziuba J, Minkiewicz P. Celiac Disease—Background, Molecular, Bioinformatics and Analytical Aspects. FOOD REVIEWS INTERNATIONAL 2008. [DOI: 10.1080/87559120802089258] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
36
|
Comprehensive immunological evaluation reveals surprisingly few differences between elite controller and progressor Mamu-B*17-positive simian immunodeficiency virus-infected rhesus macaques. J Virol 2008; 82:5245-54. [PMID: 18385251 DOI: 10.1128/jvi.00292-08] [Citation(s) in RCA: 52] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
The association between particular major histocompatibility complex class I (MHC-I) alleles and control of human immunodeficiency virus (HIV) and simian immunodeficiency virus (SIV) replication implies that certain CD8(+) T-lymphocyte (CD8-TL) responses are better able than others to control viral replication in vivo. However, possession of favorable alleles does not guarantee improved prognosis or viral control. In rhesus macaques, the MHC-I allele Mamu-B*17 is correlated with reduced viremia and is overrepresented in macaques that control SIVmac239, termed elite controllers (ECs). However, there is so far no mechanistic explanation for this phenomenon. Here we show that the chronic-phase Mamu-B*17-restricted repertoire is focused primarily against just five epitopes-VifHW8, EnvFW9, NefIW9, NefMW9, and env(ARF)cRW9-in both ECs and progressors. Interestingly, Mamu-B*17-restricted CD8-TL do not target epitopes in Gag. CD8-TL escape variation occurred in all targeted Mamu-B*17-restricted epitopes. However, recognition of escape variant peptides was commonly observed in both ECs and progressors. Wild-type sequences in the VifHW8 epitope tended to be conserved in ECs, but there was no evidence that this enhances viral control. In fact, no consistent differences were detected between ECs and progressors in any measured parameter. Our data suggest that the narrowly focused Mamu-B*17-restricted repertoire suppresses virus replication and drives viral evolution. It is, however, insufficient in the majority of individuals that express the "protective" Mamu-B*17 molecule. Most importantly, our data indicate that the important differences between Mamu-B*17-positive ECs and progressors are not readily discernible using standard assays to measure immune responses.
Collapse
|
37
|
Soeria-Atmadja D, Onell A, Kober A, Matsson P, Gustafsson MG, Hammerling U. Multivariate statistical analysis of large-scale IgE antibody measurements reveals allergen extract relationships in sensitized individuals. J Allergy Clin Immunol 2007; 120:1433-40. [PMID: 17825892 DOI: 10.1016/j.jaci.2007.07.021] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Revised: 06/28/2007] [Accepted: 07/16/2007] [Indexed: 10/22/2022]
Abstract
BACKGROUND Many allergenic sources are reportedly cross-reactive because of protein structural similarities. Although several aggregations are well characterized, no holistic mapping of IgE reactivity has hitherto been reported. OBJECTIVE The aim of this study was to disclose relevant associations within a large set of allergen preparations, as revealed by specific IgE antibody levels in blood sera of multireactive human donors. METHODS A dataset of recorded IgE antibody serum concentrations of 1011 nonidentifiable multireactive individuals (devoid of clinical records) to 89 allergen extracts was compiled for in silico analysis. Various algorithms were used to identify specific multivariate dependencies between the IgE antibody levels. RESULTS Exhaustive cluster analysis demonstrates that IgE antibody responses to the 89 extracts can be aggregated into 12 stable formations. These clusters hold both well-known relationships, unexpected patterns, and unknown patterns, the latter categories being exemplified by the coclustering of wasp and certain seafood and a clear differentiation among pollen allergens. CONCLUSION Identified relationships within several well-known groups of cross-reactive allergen extracts confirm the applicability of dedicated multivariate data analysis within the allergology field. Moreover, some of the unexpected IgE reactivity associations in sensitized human subjects might help in identifying new relationships with potential importance to allergy. CLINICAL IMPLICATIONS Although clinical implications from this study should be validated in subsequent investigations with documentation on symptoms included, we believe this seminal approach is a key step toward the development of new analysis tools for interpretation of allergy data generated by using high-throughput recording systems.
Collapse
|
38
|
Maness NJ, Valentine LE, May GE, Reed J, Piaskowski SM, Soma T, Furlott J, Rakasz EG, Friedrich TC, Price DA, Gostick E, Hughes AL, Sidney J, Sette A, Wilson NA, Watkins DI. AIDS virus specific CD8+ T lymphocytes against an immunodominant cryptic epitope select for viral escape. J Exp Med 2007; 204:2505-12. [PMID: 17954573 PMCID: PMC2118485 DOI: 10.1084/jem.20071261] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2007] [Accepted: 09/25/2007] [Indexed: 01/08/2023] Open
Abstract
Cryptic major histocompatibility complex class I epitopes have been detected in several pathogens, but their importance in the immune response to AIDS viruses remains unknown. Here, we show that Mamu-B*17(+) simian immunodeficiency virus (SIV)mac239-infected rhesus macaques that spontaneously controlled viral replication consistently made strong CD8(+) T lymphocyte (CD8-TL) responses against a cryptic epitope, RHLAFKCLW (cRW9). Importantly, cRW9-specific CD8-TL selected for viral variation in vivo and effectively suppressed SIV replication in vitro, suggesting that they might play a key role in the SIV-specific response. The discovery of an immunodominant CD8-TL response in elite controller macaques against a cryptic epitope suggests that the AIDS virus-specific cellular immune response is likely far more complex than is generally assumed.
Collapse
Affiliation(s)
- Nicholas J Maness
- Wisconsin National Primate Research Center, University of Wisconsin-Madison, Madison, WI 53711, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Martinez Barrio A, Soeria-Atmadja D, Nistér A, Gustafsson MG, Hammerling U, Bongcam-Rudloff E. EVALLER: a web server for in silico assessment of potential protein allergenicity. Nucleic Acids Res 2007; 35:W694-700. [PMID: 17537818 PMCID: PMC1933222 DOI: 10.1093/nar/gkm370] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Bioinformatics testing approaches for protein allergenicity, involving amino acid sequence comparisons, have evolved appreciably over the last several years to increased sophistication and performance. EVALLER, the web server presented in this article is based on our recently published 'Detection based on Filtered Length-adjusted Allergen Peptides' (DFLAP) algorithm, which affords in silico determination of potential protein allergenicity of high sensitivity and excellent specificity. To strengthen bioinformatics risk assessment in allergology EVALLER provides a comprehensive outline of its judgment on a query protein's potential allergenicity. Each such textual output incorporates a scoring figure, a confidence numeral of the assignment and information on high- or low-scoring matches to identified allergen-related motifs, including their respective location in accordingly derived allergens. The interface, built on a modified Perl Open Source package, enables dynamic and color-coded graphic representation of key parts of the output. Moreover, pertinent details can be examined in great detail through zoomed views. The server can be accessed at http://bioinformatics.bmc.uu.se/evaller.html.
Collapse
Affiliation(s)
- Alvaro Martinez Barrio
- Linnaeus Centre for Bioinformatics, Uppsala Biomedical Centre (BMC), Uppsala University, P.O. Box 598, SE-751 24 Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
40
|
Emanuelsson C, Spangfort MD. Allergens as eukaryotic proteins lacking bacterial homologues. Mol Immunol 2007; 44:3256-60. [DOI: 10.1016/j.molimm.2007.01.019] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2006] [Revised: 01/16/2007] [Accepted: 01/17/2007] [Indexed: 11/28/2022]
|
41
|
Mari A, Scala E, Palazzo P, Ridolfi S, Zennaro D, Carabella G. Bioinformatics applied to allergy: allergen databases, from collecting sequence information to data integration. The Allergome platform as a model. Cell Immunol 2007; 244:97-100. [PMID: 17434469 DOI: 10.1016/j.cellimm.2007.02.012] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2007] [Accepted: 02/11/2007] [Indexed: 11/18/2022]
Abstract
Allergens are proteins or glycoproteins that are recognized by IgE produced by the immune system of allergic individuals. Until now around 1,500 allergenic structures have been identified and this number seems not have reached a plateau after 3-4 decades of research and the advent of molecular biology. Several allergen databases are available on Internet. Different aims and philosophies lead to different products. Here we report about main feature of web sites dedicated to allergens and we describe in more details our current work on the Allergome platform. The web server Allergome (www.allergome.org) represent a free independent open resource whose goal is to provide an exhaustive repository of data related to all the IgE-binding compounds. The main purpose of Allergome is to collect a list of allergenic sources and molecules by using the widest selection criteria and sources. A further development of the Allergome platform has been represented by the Real Time Monitoring of IgE sensitization module (ReTiME) that allows uploading of raw data from both in vivo and in vitro testing, thus representing the first attempt to have IT applied to allergy data mining. More recently, a new module (RefArray) representing a tool for literature mining has been released.
Collapse
Affiliation(s)
- Adriano Mari
- Allergy Data Laboratories sc, Via Malipiero 28, 04100 Latina, Italy.
| | | | | | | | | | | |
Collapse
|
42
|
Schein CH, Ivanciuc O, Braun W. Bioinformatics approaches to classifying allergens and predicting cross-reactivity. Immunol Allergy Clin North Am 2007; 27:1-27. [PMID: 17276876 PMCID: PMC1941676 DOI: 10.1016/j.iac.2006.11.005] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Allergenic proteins from very different environmental sources have similar sequences and structures. This fact may account for multiple allergen syndromes, whereby a myriad of diverse plants and foods may induce a similar IgE-based reaction in certain patients. Identifying the common triggering protein in these sources, in silico, can aid designing individualized therapy for allergen sufferers. This article provides an overview of databases on allergenic proteins, and ways to identify common proteins that may be the cause of multiple allergy syndromes. The major emphasis is on the relational Structural Database of Allergenic Proteins (SDAP []), which includes cross-referenced data on the sequence, structure, and IgE epitopes of over 800 allergenic proteins, coupled with specially developed bioinformatics tools to group all allergens and identify discrete areas that may account for cross-reactivity. SDAP is freely available on the Web to clinicians and patients.
Collapse
Affiliation(s)
- Catherine H. Schein
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Microbiology and Immunology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| | - Ovidiu Ivanciuc
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| | - Werner Braun
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| |
Collapse
|
43
|
Zhang ZH, Koh JLY, Zhang GL, Choo KH, Tammi MT, Tong JC. AllerTool: a web server for predicting allergenicity and allergic cross-reactivity in proteins. Bioinformatics 2006; 23:504-6. [PMID: 17150996 DOI: 10.1093/bioinformatics/btl621] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Assessment of potential allergenicity and patterns of cross-reactivity is necessary whenever novel proteins are introduced into human food chain. Current bioinformatic methods in allergology focus mainly on the prediction of allergenic proteins, with no information on cross-reactivity patterns among known allergens. In this study, we present AllerTool, a web server with essential tools for the assessment of predicted as well as published cross-reactivity patterns of allergens. The analysis tools include graphical representation of allergen cross-reactivity information; a local sequence comparison tool that displays information of known cross-reactive allergens; a sequence similarity search tool for assessment of cross-reactivity in accordance to FAO/WHO Codex alimentarius guidelines; and a method based on support vector machine (SVM). A 10-fold cross-validation results showed that the area under the receiver operating curve (A(ROC)) of SVM models is 0.90 with 86.00% sensitivity (SE) at specificity (SP) of 86.00%. AVAILABILITY AllerTool is freely available at http://research.i2r.a-star.edu.sg/AllerTool/.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613.
| | | | | | | | | | | |
Collapse
|
44
|
Zhang ZH, Tan SCC, Koh JLY, Falus A, Brusic V. ALLERDB database and integrated bioinformatic tools for assessment of allergenicity and allergic cross-reactivity. Cell Immunol 2006; 244:90-6. [PMID: 17467675 DOI: 10.1016/j.cellimm.2007.01.007] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2007] [Accepted: 01/31/2007] [Indexed: 11/16/2022]
Abstract
Databases and computational tools are increasingly important in the study of allergies, particularly in the assessment of allergenicity and allergic cross-reactivity. ALLERDB database contains sequences of allergens and information on reported cross-reactivity between allergens. It focuses on analysis of allergenicity and allergic cross-reactivity of clinically relevant protein allergens. The official IUIS allergen data were extracted from the IUIS Allergen Nomenclature Sub-Committee website, and their sequence information from the public databases, and reference publications. The analysis tools assist allergen data analysis and retrieval, and include keyword searching, BLAST, prediction of allergenicity, modification of BLAST that displays cross-reactive allergens, and graphics representation of cross-reactivity data. ALLERDB is new brand of allergen databases with a rich set of tools for sequence comparison, pattern identification, and visualization of results. It is accessible at http://research.i2r.a-star.edu.sg/Templar/DB/Allergen.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Institute for Infocomm Research, Singapore 119613, Singapore
| | | | | | | | | |
Collapse
|
45
|
Abstract
In this study a systematic attempt has been made to integrate various approaches in order to predict allergenic proteins with high accuracy. The dataset used for testing and training consists of 578 allergens and 700 non-allergens obtained from A. K. Bjorklund, D. Soeria-Atmadja, A. Zorzet, U. Hammerling and M. G. Gustafsson (2005) Bioinformatics, 21, 39-50. First, we developed methods based on support vector machine using amino acid and dipeptide composition and achieved an accuracy of 85.02 and 84.00%, respectively. Second, a motif-based method has been developed using MEME/MAST software that achieved sensitivity of 93.94 with 33.34% specificity. Third, a database of known IgE epitopes was searched and this predicted allergenic proteins with 17.47% sensitivity at specificity of 98.14%. Fourth, we predicted allergenic proteins by performing BLAST search against allergen representative peptides. Finally hybrid approaches have been developed, which combine two or more than two approaches. The performance of all these algorithms has been evaluated on an independent dataset of 323 allergens and on 101 725 non-allergens obtained from Swiss-Prot. A web server AlgPred has been developed for the predicting allergenic proteins and for mapping IgE epitopes on allergenic proteins (http://www.imtech.res.in/raghava/algpred/). AlgPred is available at www.imtech.res.in/raghava/algpred/.
Collapse
Affiliation(s)
| | - G. P. S. Raghava
- To whom correspondence should be addressed. Tel: +91 172 2690557; Fax: +91 172 2690632;
| |
Collapse
|
46
|
Soeria-Atmadja D, Lundell T, Gustafsson MG, Hammerling U. Computational detection of allergenic proteins attains a new level of accuracy with in silico variable-length peptide extraction and machine learning. Nucleic Acids Res 2006; 34:3779-93. [PMID: 16977698 PMCID: PMC1540723 DOI: 10.1093/nar/gkl467] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The placing of novel or new-in-the-context proteins on the market, appearing in genetically modified foods, certain bio-pharmaceuticals and some household products leads to human exposure to proteins that may elicit allergic responses. Accurate methods to detect allergens are therefore necessary to ensure consumer/patient safety. We demonstrate that it is possible to reach a new level of accuracy in computational detection of allergenic proteins by presenting a novel detector, Detection based on Filtered Length-adjusted Allergen Peptides (DFLAP). The DFLAP algorithm extracts variable length allergen sequence fragments and employs modern machine learning techniques in the form of a support vector machine. In particular, this new detector shows hitherto unmatched specificity when challenged to the Swiss-Prot repository without appreciable loss of sensitivity. DFLAP is also the first reported detector that successfully discriminates between allergens and non-allergens occurring in protein families known to hold both categories. Allergenicity assessment for specific protein sequences of interest using DFLAP is possible via ulfh@slv.se.
Collapse
Affiliation(s)
| | | | - M. G. Gustafsson
- Department of Engineering Sciences, Uppsala UniversityPO Box 534, SE-751 21 Uppsala, Sweden
- Department of Genetics and Pathology, Uppsala University, Rudbeck LaboratorySE-751 85 Uppsala, Sweden
- Correspondence may also be addressed to M. G. Gustafsson. Tel: +46 18 4713229; Fax: +46 18 555096; Present address: M. G. Gustafsson, Department of Medical Sciences, Uppsala University, Uppsala University Hospital, SE-751 85 Uppsala, Sweden
| | | |
Collapse
|
47
|
Zbilut JP, Chua GH, Krishnan A, Bossa C, Colafranceschi M, Giuliani A. Entropic criteria for protein folding derived from recurrences: six residues patch as the basic protein word. FEBS Lett 2006; 580:4861-4. [PMID: 16914149 DOI: 10.1016/j.febslet.2006.07.076] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2006] [Accepted: 07/27/2006] [Indexed: 10/24/2022]
Abstract
Some research has suggested that patches of six constitute an important amino acid window length in proteins for conveying information. We present database evidence that supports this conjecture, as well as additional recurrence-based data that characterization and quantification of these words affect the folding/aggregation features of proteins. Other indirect evidence is presented and discussed.
Collapse
Affiliation(s)
- Joseph P Zbilut
- Department of Molecular Biophysics and Physiology, Rush University Medical Center, 1653 W. Congress Parkway, Chicago, IL 60612, USA.
| | | | | | | | | | | |
Collapse
|
48
|
Saha S, Raghava GPS. Prediction of continuous B-cell epitopes in an antigen using recurrent neural network. Proteins 2006; 65:40-8. [PMID: 16894596 DOI: 10.1002/prot.21078] [Citation(s) in RCA: 959] [Impact Index Per Article: 53.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
B-cell epitopes play a vital role in the development of peptide vaccines, in diagnosis of diseases, and also for allergy research. Experimental methods used for characterizing epitopes are time consuming and demand large resources. The availability of epitope prediction method(s) can rapidly aid experimenters in simplifying this problem. The standard feed-forward (FNN) and recurrent neural network (RNN) have been used in this study for predicting B-cell epitopes in an antigenic sequence. The networks have been trained and tested on a clean data set, which consists of 700 non-redundant B-cell epitopes obtained from Bcipep database and equal number of non-epitopes obtained randomly from Swiss-Prot database. The networks have been trained and tested at different input window length and hidden units. Maximum accuracy has been obtained using recurrent neural network (Jordan network) with a single hidden layer of 35 hidden units for window length of 16. The final network yields an overall prediction accuracy of 65.93% when tested by fivefold cross-validation. The corresponding sensitivity, specificity, and positive prediction values are 67.14, 64.71, and 65.61%, respectively. It has been observed that RNN (JE) was more successful than FNN in the prediction of B-cell epitopes. The length of the peptide is also important in the prediction of B-cell epitopes from antigenic sequences. The webserver ABCpred is freely available at www.imtech.res.in/raghava/abcpred/.
Collapse
Affiliation(s)
- Sudipto Saha
- Institute of Microbial Technology, Chandigarh, India
| | | |
Collapse
|
49
|
Opinion of the Scientific Panel on genetically modified organisms [GMO] on an application (Reference EFSA‐GMO‐UK‐2004‐05) for the placing on the market of insect‐protected and glufosinate and glyphosate‐tolerant genetically modified maize 1507 × NK603, for food and feed uses, and import and processing under Regulation (EC) No 1829/2003 from Pioneer Hi‐Bred and Mycogen Seeds. EFSA J 2006. [DOI: 10.2903/j.efsa.2006.355] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
|
50
|
Soeria-Atmadja D, Wallman M, Björklund AK, Isaksson A, Hammerling U, Gustafsson MG. External cross-validation for unbiased evaluation of protein family detectors: application to allergens. Proteins 2006; 61:918-25. [PMID: 16231294 DOI: 10.1002/prot.20656] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Key issues in protein science and computational biology are design and evaluation of algorithms aimed at detection of proteins that belong to a specific family, as defined by structural, evolutionary, or functional criteria. In this context, several validation techniques are often used to compare different parameter settings of the detector, and to subsequently select the setting that yields the smallest error rate estimate. A frequently overlooked problem associated with this approach is that this smallest error rate estimate may have a large optimistic bias. Based on computer simulations, we show that a detector's error rate estimate can be overly optimistic and propose a method to obtain unbiased performance estimates of a detector design procedure. The method is founded on an external 10-fold cross-validation (CV) loop that embeds an internal validation procedure used for parameter selection in detector design. The designed detector generated in each of the 10 iterations are evaluated on held-out examples exclusively available in the external CV iterations. Notably, the average of these 10 performance estimates is not associated with a final detector, but rather with the average performance of the design procedure used. We apply the external CV loop to the particular problem of detecting potentially allergenic proteins, using a previously reported design procedure. Unbiased performance estimates of the allergen detector design procedure are presented together with information about which algorithms and parameter settings that are most frequently selected.
Collapse
|