1
|
Goto K, Tamehiro N, Yoshida T, Hanada H, Sakuma T, Adachi R, Kondo K, Takeuchi I. Novel Machine Learning Method AllerStat Identifies Statistically Significant Allergen-Specific Patterns in Protein Sequences. J Biol Chem 2023; 299:104733. [PMID: 37086787 DOI: 10.1016/j.jbc.2023.104733] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2022] [Revised: 04/12/2023] [Accepted: 04/18/2023] [Indexed: 04/24/2023] Open
Abstract
Cutting-edge technologies such as genome editing and synthetic biology allow us to produce novel foods and functional proteins. However, their toxicity and allergenicity must be accurately evaluated. It is known that specific amino-acid sequences in proteins make some proteins allergic, but many of these sequences remain uncharacterized. In this study, we introduce a data-driven approach and a machine-learning (ML) method to find undiscovered allergen specific patterns (ASPs) among amino acid sequences. The proposed method enables an exhaustive search for amino-acid subsequences whose frequencies are statistically significantly higher in allergenic proteins. As a proof-of-concept, we created a database containing 21,154 proteins of which the presence or absence of allergic reactions are already known, and applied the proposed method to the database. The detected ASPs in this proof-of-concept study were consistent with known biological findings, and the allergenicity prediction performance using the detected ASPs was higher than extant approaches, indicating this method may be useful in evaluating the utility of synthetic foods and proteins.
Collapse
Affiliation(s)
- Kento Goto
- Department of Computer Science, Nagoya Institute of Technology. Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan
| | - Norimasa Tamehiro
- Division of Biochemistry, National Institute of Health Sciences. 3-25-26 Tonomachi, Kawasaki-ku, Kawasaki, Kanagawa, 210-9501, Japan
| | - Takumi Yoshida
- Department of Computer Science, Nagoya Institute of Technology. Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan
| | - Hiroyuki Hanada
- Center for Advanced Intelligence Project, RIKEN. 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan
| | - Takuto Sakuma
- Department of Computer Science, Nagoya Institute of Technology. Gokiso-cho, Showa-ku, Nagoya, Aichi, 466-8555, Japan
| | - Reiko Adachi
- Division of Biochemistry, National Institute of Health Sciences. 3-25-26 Tonomachi, Kawasaki-ku, Kawasaki, Kanagawa, 210-9501, Japan
| | - Kazunari Kondo
- Division of Biochemistry, National Institute of Health Sciences. 3-25-26 Tonomachi, Kawasaki-ku, Kawasaki, Kanagawa, 210-9501, Japan.
| | - Ichiro Takeuchi
- Graduate School of Engineering, Nagoya University.Furo-cho, Chikusa-ku, Nagoya, 464-8603, Japan; Center for Advanced Intelligence Project, RIKEN. 1-4-1 Nihonbashi, Chuo-ku, Tokyo, 103-0027, Japan.
| |
Collapse
|
2
|
Dang HX, Lawrence CB. Allerdictor: fast allergen prediction using text classification techniques. ACTA ACUST UNITED AC 2014; 30:1120-1128. [PMID: 24403538 DOI: 10.1093/bioinformatics/btu004] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2013] [Accepted: 12/30/2013] [Indexed: 11/14/2022]
Abstract
MOTIVATION Accurately identifying and eliminating allergens from biotechnology-derived products are important for human health. From a biomedical research perspective, it is also important to identify allergens in sequenced genomes. Many allergen prediction tools have been developed during the past years. Although these tools have achieved certain levels of specificity, when applied to large-scale allergen discovery (e.g. at a whole-genome scale), they still yield many false positives and thus low precision (even at low recall) due to the extreme skewness of the data (allergens are rare). Moreover, the most accurate tools are relatively slow because they use protein sequence alignment to build feature vectors for allergen classifiers. Additionally, only web server implementations of the current allergen prediction tools are publicly available and are without the capability of large batch submission. These weaknesses make large-scale allergen discovery ineffective and inefficient in the public domain. RESULTS We developed Allerdictor, a fast and accurate sequence-based allergen prediction tool that models protein sequences as text documents and uses support vector machine in text classification for allergen prediction. Test results on multiple highly skewed datasets demonstrated that Allerdictor predicted allergens with high precision over high recall at fast speed. For example, Allerdictor only took ∼6 min on a single core PC to scan a whole Swiss-Prot database of ∼540 000 sequences and identified <1% of them as allergens. AVAILABILITY AND IMPLEMENTATION Allerdictor is implemented in Python and available as standalone and web server versions at http://allerdictor.vbi.vt.edu CONTACT: lawrence@vbi.vt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ha X Dang
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| | - Christopher B Lawrence
- Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA Virginia Bioinformatics Institute and Department of Biological Sciences, Virginia Tech, Blacksburg, VA 24061, USA
| |
Collapse
|
3
|
Hammond B, Kough J, Herouet-Guicheney C, Jez JM. Toxicological evaluation of proteins introduced into food crops. Crit Rev Toxicol 2013; 43 Suppl 2:25-42. [PMID: 24164515 PMCID: PMC3835160 DOI: 10.3109/10408444.2013.842956] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2013] [Revised: 09/04/2013] [Accepted: 09/06/2013] [Indexed: 11/13/2022]
Abstract
This manuscript focuses on the toxicological evaluation of proteins introduced into GM crops to impart desired traits. In many cases, introduced proteins can be shown to have a history of safe use. Where modifications have been made to proteins, experience has shown that it is highly unlikely that modification of amino acid sequences can make a non-toxic protein toxic. Moreover, if the modified protein still retains its biological function, and this function is found in related proteins that have a history of safe use (HOSU) in food, and the exposure level is similar to functionally related proteins, then the modified protein could also be considered to be "as-safe-as" those that have a HOSU. Within nature, there can be considerable evolutionary changes in the amino acid sequence of proteins within the same family, yet these proteins share the same biological function. In general, food crops such as maize, soy, rice, canola etc. are subjected to a variety of processing conditions to generate different food products. Processing conditions such as cooking, modification of pH conditions, and mechanical shearing can often denature proteins in these crops resulting in a loss of functional activity. These same processing conditions can also markedly lower human dietary exposure to (functionally active) proteins. Safety testing of an introduced protein could be indicated if its biological function was not adequately characterized and/or it was shown to be structurally/functionally related to proteins that are known to be toxic to mammals.
Collapse
Affiliation(s)
| | - John Kough
- Office of Pesticide Programs, Microbial Pesticides Branch, US Environmental Protection AgencyWashington, DCUSA
| | | | - Joseph M. Jez
- Department of Biology, Washington University in St. LouisSt. Louis, MOUSA
| |
Collapse
|
4
|
Genter MB. Editor’s Note. Int J Toxicol 2013; 32:99. [DOI: 10.1177/1091581813483101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
|