1
|
Hu Y, Badar IH, Liu Y, Zhu Y, Yang L, Kong B, Xu B. Advancements in production, assessment, and food applications of salty and saltiness-enhancing peptides: A review. Food Chem 2024; 453:139664. [PMID: 38761739 DOI: 10.1016/j.foodchem.2024.139664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Revised: 05/01/2024] [Accepted: 05/12/2024] [Indexed: 05/20/2024]
Abstract
Salt is important for food flavor, but excessive sodium intake leads to adverse health consequences. Thus, salty and saltiness-enhancing peptides are developed for sodium-reduction products. This review elucidates saltiness perception process and analyses correlation between the peptide structure and saltiness-enhancing ability. These peptides interact with taste receptors to produce saltiness perception, including ENaC, TRPV1, and TMC4. This review also outlines preparation, isolation, purification, characterization, screening, and assessment techniques of these peptides and discusses their potential applications. These peptides are from various sources and produced through enzymatic hydrolysis, microbial fermentation, or Millard reaction and then separated, purified, identified, and screened. Sensory evaluation, electronic tongue, bioelectronic tongue, and cell and animal models are the primary saltiness assessment approaches. These peptides can be used in sodium-reduction food products to produce "clean label" items, and the peptides with biological activity can also serve as functional ingredients, making them very promising for food industry.
Collapse
Affiliation(s)
- Yingying Hu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, Anhui 230009, China; State Key Laboratory of Meat Quality Control and Cultured Meat Development, Jiangsu Yurun Meat Industry Group Co., Ltd, Nanjing, Jiangsu 210041, China
| | - Iftikhar Hussain Badar
- College of Food Science, Northeast Agricultural University, Harbin, Heilongjiang 150030, China; Department of Meat Science and Technology, University of Veterinary and Animal Sciences, Lahore 54000, Pakistan
| | - Yue Liu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, Anhui 230009, China
| | - Yuan Zhu
- State Key Laboratory of Meat Quality Control and Cultured Meat Development, Jiangsu Yurun Meat Industry Group Co., Ltd, Nanjing, Jiangsu 210041, China
| | - Linwei Yang
- State Key Laboratory of Meat Quality Control and Cultured Meat Development, Jiangsu Yurun Meat Industry Group Co., Ltd, Nanjing, Jiangsu 210041, China
| | - Baohua Kong
- College of Food Science, Northeast Agricultural University, Harbin, Heilongjiang 150030, China.
| | - Baocai Xu
- School of Food and Biological Engineering, Hefei University of Technology, Hefei, Anhui 230009, China.
| |
Collapse
|
2
|
Richter P, Sebald K, Fischer K, Schnieke A, Jlilati M, Mittermeier-Klessinger V, Somoza V. Gastric digestion of the sweet-tasting plant protein thaumatin releases bitter peptides that reduce H. pylori induced pro-inflammatory IL-17A release via the TAS2R16 bitter taste receptor. Food Chem 2024; 448:139157. [PMID: 38569411 DOI: 10.1016/j.foodchem.2024.139157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2023] [Revised: 03/08/2024] [Accepted: 03/25/2024] [Indexed: 04/05/2024]
Abstract
About half of the world's population is infected with the bacterium Helicobacter pylori. For colonization, the bacterium neutralizes the low gastric pH and recruits immune cells to the stomach. The immune cells secrete cytokines, i.e., the pro-inflammatory IL-17A, which directly or indirectly damage surface epithelial cells. Since (I) dietary proteins are known to be digested into bitter tasting peptides in the gastric lumen, and (II) bitter tasting compounds have been demonstrated to reduce the release of pro-inflammatory cytokines through functional involvement of bitter taste receptors (TAS2Rs), we hypothesized that the sweet-tasting plant protein thaumatin would be cleaved into anti-inflammatory bitter peptides during gastric digestion. Using immortalized human parietal cells (HGT-1 cells), we demonstrated a bitter taste receptor TAS2R16-dependent reduction of a H. pylori-evoked IL-17A release by up to 89.7 ± 21.9% (p ≤ 0.01). Functional involvement of TAS2R16 was demonstrated by the study of specific antagonists and siRNA knock-down experiments.
Collapse
Affiliation(s)
- Phil Richter
- TUM School of Life Sciences Weihenstephan, Technical University of Munich, Alte Akademie 8, 85354 Freising, Germany; Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany.
| | - Karin Sebald
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany.
| | - Konrad Fischer
- Livestock Biotechnology, TUM School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Str. 1, 85,354 Freising, Germany.
| | - Angelika Schnieke
- Livestock Biotechnology, TUM School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Str. 1, 85,354 Freising, Germany.
| | - Malek Jlilati
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany
| | - Verena Mittermeier-Klessinger
- Food Chemistry and Molecular Sensory Science, Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany.
| | - Veronika Somoza
- Leibniz Institute for Food Systems Biology at the Technical University of Munich, Lise-Meitner-Str. 34, 85354 Freising, Germany; Nutritional Systems Biology, TUM School of Life Sciences, Technical University of Munich, Lise-Meitner-Str. 34, 85,354 Freising, Germany; Department of Physiological Chemistry, Faculty of Chemistry, University of Vienna, Josef-Holaubek-Platz 2 (UZA II), 1090 Wien, Austria.
| |
Collapse
|
3
|
Zhai S, Tan Y, Zhu C, Zhang C, Gao Y, Mao Q, Zhang Y, Duan H, Yin Y. PepExplainer: An explainable deep learning model for selection-based macrocyclic peptide bioactivity prediction and optimization. Eur J Med Chem 2024; 275:116628. [PMID: 38944933 DOI: 10.1016/j.ejmech.2024.116628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Revised: 06/21/2024] [Accepted: 06/24/2024] [Indexed: 07/02/2024]
Abstract
Macrocyclic peptides possess unique features, making them highly promising as a drug modality. However, evaluating their bioactivity through wet lab experiments is generally resource-intensive and time-consuming. Despite advancements in artificial intelligence (AI) for bioactivity prediction, challenges remain due to limited data availability and the interpretability issues in deep learning models, often leading to less-than-ideal predictions. To address these challenges, we developed PepExplainer, an explainable graph neural network based on substructure mask explanation (SME). This model excels at deciphering amino acid substructures, translating macrocyclic peptides into detailed molecular graphs at the atomic level, and efficiently handling non-canonical amino acids and complex macrocyclic peptide structures. PepExplainer's effectiveness is enhanced by utilizing the correlation between peptide enrichment data from selection-based focused library and bioactivity data, and employing transfer learning to improve bioactivity predictions of macrocyclic peptides against IL-17C/IL-17 RE interaction. Additionally, PepExplainer underwent further validation for bioactivity prediction using an additional set of thirteen newly synthesized macrocyclic peptides. Moreover, it enabled the optimization of the IC50 of a macrocyclic peptide, reducing it from 15 nM to 5.6 nM based on the contribution score provided by PepExplainer. This achievement underscores PepExplainer's skill in deciphering complex molecular patterns, highlighting its potential to accelerate the discovery and optimization of macrocyclic peptides.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Yan Gao
- Qilu Institute of Technology, Jinan, 250200, China
| | - Qingyi Mao
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou, 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China
| | - Hongliang Duan
- Faculty of Applied Sciences, Macao Polytechnic University, Macao, 999078, China.
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao, 266237, China; Shandong Research Institute of Industrial Technology, Jinan, 250101, China.
| |
Collapse
|
4
|
Zhang J, Zhao L, Wang W, Zhang Q, Wang XT, Xing DF, Ren NQ, Lee DJ, Chen C. Large language model for horizontal transfer of resistance gene: From resistance gene prevalence detection to plasmid conjugation rate evaluation. THE SCIENCE OF THE TOTAL ENVIRONMENT 2024; 931:172466. [PMID: 38626826 DOI: 10.1016/j.scitotenv.2024.172466] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2024] [Revised: 04/10/2024] [Accepted: 04/11/2024] [Indexed: 05/07/2024]
Abstract
The burgeoning issue of plasmid-mediated resistance genes (ARGs) dissemination poses a significant threat to environmental integrity. However, the prediction of ARGs prevalence is overlooked, especially for emerging ARGs that are potentially evolving gene exchange hotspot. Here, we explored to classify plasmid or chromosome sequences and detect resistance gene prevalence by using DNABERT. Initially, the DNABERT fine-tuned in plasmid and chromosome sequences followed by multilayer perceptron (MLP) classifier could achieve 0.764 AUC (Area under curve) on external datasets across 23 genera, outperforming 0.02 AUC than traditional statistic-based model. Furthermore, Escherichia, Pseudomonas single genera based model were also be trained to explore its predict performance to ARGs prevalence detection. By integrating K-mer frequency attributes, our model could boost the performance to predict the prevalence of ARGs in an external dataset in Escherichia with 0.0281-0.0615 AUC and Pseudomonas with 0.0196-0.0928 AUC. Finally, we established a random forest model aimed at forecasting the relative conjugation transfer rate of plasmids with 0.7956 AUC, drawing on data from existing literature. It identifies the plasmid's repression status, cellular density, and temperature as the most important factors influencing transfer frequency. With these two models combined, they provide useful reference for quick and low-cost integrated evaluation of resistance gene transfer, accelerating the process of computer-assisted quantitative risk assessment of ARGs transfer in environmental field.
Collapse
Affiliation(s)
- Jiabin Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Lei Zhao
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Wei Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China.
| | - Quan Zhang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Xue-Ting Wang
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - De-Feng Xing
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China
| | - Nan-Qi Ren
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China; Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen 518055, China
| | - Duu-Jong Lee
- Department of Mechanical Engineering, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong
| | - Chuan Chen
- State Key Laboratory of Urban Water Resource and Environment, School of Environment, Harbin Institute of Technology, Harbin, Heilongjiang Province 150090, China.
| |
Collapse
|
5
|
Lavecchia A. Advancing drug discovery with deep attention neural networks. Drug Discov Today 2024; 29:104067. [PMID: 38925473 DOI: 10.1016/j.drudis.2024.104067] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2024] [Revised: 06/10/2024] [Accepted: 06/19/2024] [Indexed: 06/28/2024]
Abstract
In the dynamic field of drug discovery, deep attention neural networks are revolutionizing our approach to complex data. This review explores the attention mechanism and its extended architectures, including graph attention networks (GATs), transformers, bidirectional encoder representations from transformers (BERT), generative pre-trained transformers (GPTs) and bidirectional and auto-regressive transformers (BART). Delving into their core principles and multifaceted applications, we uncover their pivotal roles in catalyzing de novo drug design, predicting intricate molecular properties and deciphering elusive drug-target interactions. Despite challenges, these attention-based architectures hold unparalleled promise to drive transformative breakthroughs and accelerate progress in pharmaceutical research.
Collapse
Affiliation(s)
- Antonio Lavecchia
- Drug Discovery Laboratory, Department of Pharmacy, University of Napoli Federico II, I-80131 Naples, Italy.
| |
Collapse
|
6
|
Nguyen VN, Ho TT, Doan TD, Le NQK. Using a hybrid neural network architecture for DNA sequence representation: A study on N 4-methylcytosine sites. Comput Biol Med 2024; 178:108664. [PMID: 38875905 DOI: 10.1016/j.compbiomed.2024.108664] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2024] [Revised: 05/11/2024] [Accepted: 05/26/2024] [Indexed: 06/16/2024]
Abstract
N4-methylcytosine (4mC) is a modified form of cytosine found in DNA, contributing to epigenetic regulation. It exists in various genomes, including the Rosaceae family encompassing significant fruit crops like apples, cherries, and roses. Previous investigations have examined the distribution and functional implications of 4mC sites within the Rosaceae genome, focusing on their potential roles in gene expression regulation, environmental adaptation, and evolution. This research aims to improve the accuracy of predicting 4mC sites within the genome of Fragaria vesca, a Rosaceae plant species. Building upon the original 4mc-w2vec method, which combines word embedding processing and a convolutional neural network (CNN), we have incorporated additional feature encoding techniques and leveraged pre-trained natural language processing (NLP) models with different deep learning architectures including different forms of CNN, recurrent neural networks (RNN) and long short-term memory (LSTM). Our assessments have shown that the best model is derived from a CNN model using fastText encoding. This model demonstrates enhanced performance, achieving a sensitivity of 0.909, specificity of 0.77, and accuracy of 0.879 on an independent dataset. Furthermore, our model surpasses previously published works on the same dataset, thus showcasing its superior predictive capabilities.
Collapse
Affiliation(s)
- Van-Nui Nguyen
- University of Information and Communication Technology, Thai Nguyen University, Thai Nguyen, Viet Nam
| | - Trang-Thi Ho
- Department of Computer Science and Information Engineering, TamKang University, New Taipei, 251301, Taiwan
| | - Thu-Dung Doan
- International Degree Program in Animal Vaccine Technology, International College, National Pingtung University of Science and Technology, Pingtung, Taiwan
| | - Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, 110, Taiwan; Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, 110, Taiwan; AIBioMed Research Group, Taipei Medical University, Taipei, 110, Taiwan; Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, 110, Taiwan.
| |
Collapse
|
7
|
Park JH, Prasad V, Newsom S, Najar F, Rajan R. IdMotif: An Interactive Motif Identification in Protein Sequences. IEEE COMPUTER GRAPHICS AND APPLICATIONS 2024; 44:114-125. [PMID: 38127603 DOI: 10.1109/mcg.2023.3345742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/23/2023]
Abstract
This article presents a visual analytics framework, idMotif, to support domain experts in identifying motifs in protein sequences. A motif is a short sequence of amino acids usually associated with distinct functions of a protein, and identifying similar motifs in protein sequences helps us to predict certain types of disease or infection. idMotif can be used to explore, analyze, and visualize such motifs in protein sequences. We introduce a deep-learning-based method for grouping protein sequences and allow users to discover motif candidates of protein groups based on local explanations of the decision of a deep-learning model. idMotif provides several interactive linked views for between and within protein cluster/group and sequence analysis. Through a case study and experts' feedback, we demonstrate how the framework helps domain experts analyze protein sequences and motif identification.
Collapse
|
8
|
Fang C, He J, Yamana H. MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model. J Bioinform Comput Biol 2024; 22:2450006. [PMID: 38812466 DOI: 10.1142/s0219720024500069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2024]
Abstract
Molecular recognition features (MoRFs) are particular functional segments of disordered proteins, which play crucial roles in regulating the phase transition of membrane-less organelles and frequently serve as central sites in cellular interaction networks. As the association between disordered proteins and severe diseases continues to be discovered, identifying MoRFs has gained growing significance. Due to the limited number of experimentally validated MoRFs, the performance of existing MoRF's prediction algorithms is not good enough and still needs to be improved. In this research, we present a model named MoRF_ESM, which utilizes deep-learning protein representations to predict MoRFs in disordered proteins. This approach employs a pretrained ESM-2 protein language model to generate embedding representations of residues in the form of attention map matrices. These representations are combined with a self-learned TextCNN model for feature extraction and prediction. In addition, an averaging step was incorporated at the end of the MoRF_ESM model to refine the output and generate final prediction results. In comparison to other impressive methods on benchmark datasets, the MoRF_ESM approach demonstrates state-of-the-art performance, achieving [Formula: see text] higher AUC than other methods when tested on TEST1 and achieving [Formula: see text] higher AUC than other methods when tested on TEST2. These results imply that the combination of ESM-2 and TextCNN can effectively extract deep evolutionary features related to protein structure and function, along with capturing shallow pattern features located in protein sequences, and is well qualified for the prediction task of MoRFs. Given that ESM-2 is a highly versatile protein language model, the methodology proposed in this study can be readily applied to other tasks involving the classification of protein sequences.
Collapse
Affiliation(s)
- Chun Fang
- Department of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
- Department of Computer Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku, Tokyo 169-8555, Japan
| | - Jiasheng He
- Department of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Hayato Yamana
- Department of Computer Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku, Tokyo 169-8555, Japan
| |
Collapse
|
9
|
Hesamzadeh P, Seif A, Mahmoudzadeh K, Ganjali Koli M, Mostafazadeh A, Nayeri K, Mirjafary Z, Saeidian H. De novo antioxidant peptide design via machine learning and DFT studies. Sci Rep 2024; 14:6473. [PMID: 38499731 PMCID: PMC10948870 DOI: 10.1038/s41598-024-57247-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2023] [Accepted: 03/15/2024] [Indexed: 03/20/2024] Open
Abstract
Antioxidant peptides (AOPs) are highly valued in food and pharmaceutical industries due to their significant role in human function. This study introduces a novel approach to identifying robust AOPs using a deep generative model based on sequence representation. Through filtration with a deep-learning classification model and subsequent clustering via the Butina cluster algorithm, twelve peptides (GP1-GP12) with potential antioxidant capacity were predicted. Density functional theory (DFT) calculations guided the selection of six peptides for synthesis and biological experiments. Molecular orbital representations revealed that the HOMO for these peptides is primarily localized on the indole segment, underscoring its pivotal role in antioxidant activity. All six synthesized peptides exhibited antioxidant activity in the DPPH assay, while the hydroxyl radical test showed suboptimal results. A hemolysis assay confirmed the non-hemolytic nature of the generated peptides. Additionally, an in silico investigation explored the potential inhibitory interaction between the peptides and the Keap1 protein. Analysis revealed that ligands GP3, GP4, and GP12 induced significant structural changes in proteins, affecting their stability and flexibility. These findings highlight the capability of machine learning approaches in generating novel antioxidant peptides.
Collapse
Affiliation(s)
- Parsa Hesamzadeh
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Abdolvahab Seif
- Dipartimento di Fisica, Universita' di Padova, Via Marzolo 8, 35131, Padua, Italy
- Department of Chemistry, University of Turin, Via Pietro Giuria 7, 10125, Turin, Italy
| | - Kazem Mahmoudzadeh
- Department of Organic Chemistry and Oil, Faculty of Chemistry, Shahid Beheshti University, Tehran, Iran
| | | | - Amrollah Mostafazadeh
- Cellular and Molecular Biology Research Center, Health Research Institute, Babol University of Medical Sciences, Babol, Iran
| | - Kosar Nayeri
- Student Research Committee, Babol University of Medical Sciences, Babol, Iran
| | - Zohreh Mirjafary
- Department of Chemistry, Science and Research Branch, Islamic Azad University, Tehran, Iran
| | - Hamid Saeidian
- Department of Science, Payame Noor University (PNU), PO Box: 19395-4697, Tehran, Iran.
| |
Collapse
|
10
|
He Y, Liu K, Liu Y, Han W. Prediction of bitterness based on modular designed graph neural network. BIOINFORMATICS ADVANCES 2024; 4:vbae041. [PMID: 38566918 PMCID: PMC10987211 DOI: 10.1093/bioadv/vbae041] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Revised: 01/31/2024] [Accepted: 03/11/2024] [Indexed: 04/04/2024]
Abstract
Motivation Bitterness plays a pivotal role in our ability to identify and evade harmful substances in food. As one of the five tastes, it constitutes a critical component of our sensory experiences. However, the reliance on human tasting for discerning flavors presents cost challenges, rendering in silico prediction of bitterness a more practical alternative. Results In this study, we introduce the use of Graph Neural Networks (GNNs) in bitterness prediction, superseding traditional machine learning techniques. We developed an advanced model, a Hybrid Graph Neural Network (HGNN), surpassing conventional GNNs according to tests on public datasets. Using HGNN and three other GNNs, we designed BitterGNNs, a bitterness predictor that achieved an AUC value of 0.87 in both external bitter/non-bitter and bitter/sweet evaluations, outperforming the acclaimed RDKFP-MLP predictor with AUC values of 0.86 and 0.85. We further created a bitterness prediction website and database, TastePD (https://www.tastepd.com/). The BitterGNNs predictor, built on GNNs, offers accurate bitterness predictions, enhancing the efficacy of bitterness prediction, aiding advanced food testing methodology development, and deepening our understanding of bitterness origins. Availability and implementation TastePD can be available at https://www.tastepd.com, all codes are at https://github.com/heyigacu/BitterGNN.
Collapse
Affiliation(s)
- Yi He
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Kaifeng Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Yuyang Liu
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| | - Weiwei Han
- Key Laboratory for Molecular Enzymology and Engineering of Ministry of Education, School of Life Science, Jilin University, Changchun 130012, China
| |
Collapse
|
11
|
Banić M, Butorac K, Čuljak N, Butorac A, Novak J, Pavunc AL, Rušanac A, Stanečić Ž, Lovrić M, Šušković J, Kos B. An Integrated Comprehensive Peptidomics and In Silico Analysis of Bioactive Peptide-Rich Milk Fermented by Three Autochthonous Cocci Strains. Int J Mol Sci 2024; 25:2431. [PMID: 38397111 PMCID: PMC10888711 DOI: 10.3390/ijms25042431] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2024] [Revised: 02/12/2024] [Accepted: 02/17/2024] [Indexed: 02/25/2024] Open
Abstract
Bioactive peptides (BPs) are molecules of paramount importance with great potential for the development of functional foods, nutraceuticals or therapeutics for the prevention or treatment of various diseases. A functional BP-rich dairy product was produced by lyophilisation of bovine milk fermented by the autochthonous strains Lactococcus lactis subsp. lactis ZGBP5-51, Enterococcus faecium ZGBP5-52 and Enterococcus faecalis ZGBP5-53 isolated from the same artisanal fresh cheese. The efficiency of the proteolytic system of the implemented strains in the production of BPs was confirmed by a combined high-throughput mass spectrometry (MS)-based peptidome profiling and an in silico approach. First, peptides released by microbial fermentation were identified via a non-targeted peptide analysis (NTA) comprising reversed-phase nano-liquid chromatography (RP nano-LC) coupled with matrix-assisted laser desorption/ionisation-time-of-flight/time-of-flight (MALDI-TOF/TOF) MS, and then quantified by targeted peptide analysis (TA) involving RP ultrahigh-performance LC (RP-UHPLC) coupled with triple-quadrupole MS (QQQ-MS). A combined database and literature search revealed that 10 of the 25 peptides identified in this work have bioactive properties described in the literature. Finally, by combining the output of MS-based peptidome profiling with in silico bioactivity prediction tools, three peptides (75QFLPYPYYAKPA86, 40VAPFPEVFGK49, 117ARHPHPHLSF126), whose bioactive properties have not been previously reported in the literature, were identified as potential BP candidates.
Collapse
Affiliation(s)
- Martina Banić
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Katarina Butorac
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Nina Čuljak
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Ana Butorac
- BICRO Biocentre Ltd., Borongajska cesta 83H, 10000 Zagreb, Croatia; (A.B.); (Ž.S.); (M.L.)
| | - Jasna Novak
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Andreja Leboš Pavunc
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Anamarija Rušanac
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Željka Stanečić
- BICRO Biocentre Ltd., Borongajska cesta 83H, 10000 Zagreb, Croatia; (A.B.); (Ž.S.); (M.L.)
| | - Marija Lovrić
- BICRO Biocentre Ltd., Borongajska cesta 83H, 10000 Zagreb, Croatia; (A.B.); (Ž.S.); (M.L.)
| | - Jagoda Šušković
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| | - Blaženka Kos
- Laboratory for Antibiotic, Enzyme, Probiotic and Starter Culture Technologies, Faculty of Food Technology and Biotechnology, University of Zagreb, Pierottijeva 6, 10000 Zagreb, Croatia; (M.B.); (K.B.); (N.Č.); (J.N.); (A.L.P.); (A.R.); (J.Š.)
| |
Collapse
|
12
|
Zou H. iDPPIV-SI: identifying dipeptidyl peptidase IV inhibitory peptides by using multiple sequence information. J Biomol Struct Dyn 2024; 42:2144-2152. [PMID: 37125813 DOI: 10.1080/07391102.2023.2203257] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2022] [Accepted: 04/10/2023] [Indexed: 05/02/2023]
Abstract
Currently, diabetes has become a great threaten for people's health in the world. Recent study shows that dipeptidyl peptidase IV (DPP-IV) inhibitory peptides may be a potential pharmaceutical agent to treat diabetes. Thus, there is a need to discriminate DPP-IV inhibitory peptides from non-DPP-IV inhibitory peptides. To address this issue, a novel computational model called iDPPIV-SI was developed in this study. In the first, 50 different types of physicochemical (PC) properties were employed to denote the peptide sequences. Three different feature descriptors including the 1-order, 2-order correlation methods and discrete wavelet transform were applied to collect useful information from the PC matrix. Furthermore, the least absolute shrinkage and selection operator (LASSO) algorithm was employed to select these most discriminative features. All of these chosen features were fed into support vector machine (SVM) for identifying DPP-IV inhibitory peptides. The iDPPIV-SI achieved 91.26% and 98.12% classification accuracies on the training and independent dataset, respectively. There is a significantly improvement in the classification performance by the proposed method, as compared with the state-of-the-art predictors. The datasets and MATLAB codes (based on MATLAB2015b) used in current study are available at https://figshare.com/articles/online_resource/iDPPIV-SI/20085878.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Hongliang Zou
- School of Communications and Electronics, Jiangxi Science and Technology Normal University, Nanchang, China
| |
Collapse
|
13
|
Yu Y, Liu S, Zhang X, Yu W, Pei X, Liu L, Jin Y. Identification and prediction of milk-derived bitter taste peptides based on peptidomics technology and machine learning method. Food Chem 2024; 433:137288. [PMID: 37683467 DOI: 10.1016/j.foodchem.2023.137288] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2023] [Revised: 07/19/2023] [Accepted: 08/24/2023] [Indexed: 09/10/2023]
Abstract
Bitter taste peptides (BPs) are vital for drug and nutrition research, but large-scale screening of them is still time-consuming and costly. This study developed a complete workflow for screening BPs based on peptidomics technology and machine learning method. Using an expanded dataset and a new combination of BPs' characteristic factors, a novel classification prediction model (CPM-BP) based on the Light Gradient Boosting Machine algorithm was constructed with an accuracy of 90.3 % for predicting BPs. Among 724 significantly different peptides between spoiled and fresh UHT milk, 180 potential BPs were predicted using CPM-BP and eleven of them were previously reported. One known BP (FALPQYLK) and three predicted potential BPs (FALPQYL, FFVAPFPEVFGKE, EMPFPKYP) were verified by determination of calcium mobilization of HEK293T cells expressing human bitter taste receptor T2R4 (hT2R4). Three potential BPs could activate the hT2R4 and are demonstrated to be BPs, which proved the effectiveness of CPM-BP.
Collapse
Affiliation(s)
- Yang Yu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China
| | - Shengchi Liu
- School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China
| | - Xinchen Zhang
- School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China
| | - Wenhao Yu
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China
| | - Xiaoyan Pei
- Inner Mongolia Yili Industrial Group Co., Ltd., Hohhot, Inner Mongolia 010110, China
| | - Li Liu
- School of Information Science and Engineering, Dalian Polytechnic University, Dalian, Liaoning 116034, China.
| | - Yan Jin
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, Dalian, Liaoning 116023, China.
| |
Collapse
|
14
|
Zulfiqar H, Guo Z, Ahmad RM, Ahmed Z, Cai P, Chen X, Zhang Y, Lin H, Shi Z. Deep-STP: a deep learning-based approach to predict snake toxin proteins by using word embeddings. Front Med (Lausanne) 2024; 10:1291352. [PMID: 38298505 PMCID: PMC10829051 DOI: 10.3389/fmed.2023.1291352] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2023] [Accepted: 12/26/2023] [Indexed: 02/02/2024] Open
Abstract
Snake venom contains many toxic proteins that can destroy the circulatory system or nervous system of prey. Studies have found that these snake venom proteins have the potential to treat cardiovascular and nervous system diseases. Therefore, the study of snake venom protein is conducive to the development of related drugs. The research technologies based on traditional biochemistry can accurately identify these proteins, but the experimental cost is high and the time is long. Artificial intelligence technology provides a new means and strategy for large-scale screening of snake venom proteins from the perspective of computing. In this paper, we developed a sequence-based computational method to recognize snake toxin proteins. Specially, we utilized three different feature descriptors, namely g-gap, natural vector and word 2 vector, to encode snake toxin protein sequences. The analysis of variance (ANOVA), gradient-boost decision tree algorithm (GBDT) combined with incremental feature selection (IFS) were used to optimize the features, and then the optimized features were input into the deep learning model for model training. The results show that our model can achieve a prediction performance with an accuracy of 82.00% in 10-fold cross-validation. The model is further verified on independent data, and the accuracy rate reaches to 81.14%, which demonstrated that our model has excellent prediction performance and robustness.
Collapse
Affiliation(s)
- Hasan Zulfiqar
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Zhiling Guo
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Ramala Masood Ahmad
- Department of Plant Breeding and Genetics, University of Agriculture Faisalabad, Faisalabad, Pakistan
| | - Zahoor Ahmed
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Peiling Cai
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Xiang Chen
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Hao Lin
- Yangtze Delta Region Institute (Huzhou), University of Electronic Science and Technology of China, Huzhou, Zhejiang, China
| | - Zheng Shi
- Clinical Genetics Laboratory, Clinical Medical College & Affiliated Hospital, Chengdu University, Chengdu, China
| |
Collapse
|
15
|
Li W, Li G, Sun Y, Zhang L, Cui X, Jia Y, Zhao T. Prediction of SARS-CoV-2 Infection Phosphorylation Sites and Associations of these Modifications with Lung Cancer Development. Curr Gene Ther 2024; 24:239-248. [PMID: 37957848 DOI: 10.2174/0115665232268074231026111634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2023] [Revised: 10/05/2023] [Accepted: 10/07/2023] [Indexed: 11/15/2023]
Abstract
INTRODUCTION Since the emergence of SARS-CoV-2 viruses, multiple mutant strains have been identified. Infection with SARS-CoV-2 virus leads to alterations in host cell phosphorylation signal, which systematically modulates the immune response. METHODS Identification and analysis of SARS-CoV-2 virus infection phosphorylation sites enable insight into the mechanisms of viral infection and effects on host cells, providing important fundamental data for the study and development of potent drugs for the treatment of immune inflammatory diseases. In this paper, we have analyzed the SARS-CoV-2 virus-infected phosphorylation region and developed a transformer-based deep learning-assisted identification method for the specific identification of phosphorylation sites in SARS-CoV-2 virus-infected host cells. RESULTS Furthermore, through association analysis with lung cancer, we found that SARS-CoV-2 infection may affect the regulatory role of the immune system, leading to an abnormal increase or decrease in the immune inflammatory response, which may be associated with the development and progression of cancer. CONCLUSION We anticipate that this study will provide an important reference for SARS-CoV-2 virus evolution as well as immune-related studies and provide a reliable complementary screening tool for anti-SARS-CoV-2 virus drug and vaccine design.
Collapse
Affiliation(s)
- Wei Li
- Institute of Bioinformatics, Harbin Institute of Technology, Harbin, China
| | - Gen Li
- Department of Radiation Oncology, Harbin Medical University Cancer Hospital, Harbin, China
| | - Yuzhi Sun
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Liyuan Zhang
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xinran Cui
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yuran Jia
- Institute for Bioinformatics, School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Tianyi Zhao
- School of Medicine and Health, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
16
|
Dhibar S, Jana B. Accurate Prediction of Antifreeze Protein from Sequences through Natural Language Text Processing and Interpretable Machine Learning Approaches. J Phys Chem Lett 2023; 14:10727-10735. [PMID: 38009833 DOI: 10.1021/acs.jpclett.3c02817] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Antifreeze proteins (AFPs) bind to growing iceplanes owing to their structural complementarity nature, thereby inhibiting the ice-crystal growth by thermal hysteresis. Classification of AFPs from sequence is a difficult task due to their low sequence similarity, and therefore, the usual sequence similarity algorithms, like Blast and PSI-Blast, are not efficient. Here, a method combining n-gram feature vectors and machine learning models to accelerate the identification of potential AFPs from sequences is proposed. All these n-gram features are extracted from the K-mer counting method. The comparative analysis reveals that, among different machine learning models, Xgboost outperforms others in predicting AFPs from sequence when penta-mers are used as a feature vector. When tested on an independent dataset, our method performed better compared to other existing ones with sensitivity of 97.50%, recall of 98.30%, and f1 score of 99.10%. Further, we used the SHAP method, which provides important insight into the functional activity of AFPs.
Collapse
Affiliation(s)
- Saikat Dhibar
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| | - Biman Jana
- School of Chemical Sciences, Indian Association for the Cultivation of Science, Jadavpur, Kolkata 700032, India
| |
Collapse
|
17
|
Dutta P, Jain D, Gupta R, Rai B. Classification of tastants: A deep learning based approach. Mol Inform 2023; 42:e202300146. [PMID: 37885360 DOI: 10.1002/minf.202300146] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2023] [Revised: 09/26/2023] [Accepted: 10/26/2023] [Indexed: 10/28/2023]
Abstract
Predicting the taste of molecules is of critical importance in the food and beverages, flavor, and pharmaceutical industries for the design and screening of new tastants. In this work, we have built deep learning models to classify sweet, bitter, and umami molecules- the three basic tastes whose sensation is mediated by G protein-coupled receptors. An extensive dataset containing 1466 bitter, 1764 sweet, and 238 umami tastants was curated from existing literature. We analyzed the chemical characteristics of the molecules, with special focus on the presence of different functional groups. A deep neural network model based on molecular descriptors and a graph neural network model were trained for taste prediction. The class imbalance due to fewer umami molecules was tackled using special sampling techniques. Both models show comparable performance during evaluation, but the graph-based model can learn task-specific representations from the molecular structure without requiring handcrafted features. We further explain the deep neural network predictions using Shapley additive explanations. Finally, we demonstrated the applicability of the models by screening bitter, sweet, and umami molecules from a large food database. This study develops an in-silico approach to classify molecules based on their taste by leveraging the recent progress in deep learning, which can serve as a powerful tool for tastant design.
Collapse
Affiliation(s)
- Prantar Dutta
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| | - Deepak Jain
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| | - Rakesh Gupta
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| | - Beena Rai
- Physical Sciences Research Area, Tata Research Development and Design Centre, TCS Research, 54-B, Hadapsar Industrial Estate, Pune, 411013, India
| |
Collapse
|
18
|
Le NQK. Leveraging transformers-based language models in proteome bioinformatics. Proteomics 2023; 23:e2300011. [PMID: 37381841 DOI: 10.1002/pmic.202300011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2023] [Revised: 06/13/2023] [Accepted: 06/13/2023] [Indexed: 06/30/2023]
Abstract
In recent years, the rapid growth of biological data has increased interest in using bioinformatics to analyze and interpret this data. Proteomics, which studies the structure, function, and interactions of proteins, is a crucial area of bioinformatics. Using natural language processing (NLP) techniques in proteomics is an emerging field that combines machine learning and text mining to analyze biological data. Recently, transformer-based NLP models have gained significant attention for their ability to process variable-length input sequences in parallel, using self-attention mechanisms to capture long-range dependencies. In this review paper, we discuss the recent advancements in transformer-based NLP models in proteome bioinformatics and examine their advantages, limitations, and potential applications to improve the accuracy and efficiency of various tasks. Additionally, we highlight the challenges and future directions of using these models in proteome bioinformatics research. Overall, this review provides valuable insights into the potential of transformer-based NLP models to revolutionize proteome bioinformatics.
Collapse
Affiliation(s)
- Nguyen Quoc Khanh Le
- Professional Master Program in Artificial Intelligence in Medicine, College of Medicine, Taipei Medical University, Taipei, Taiwan
- AIBioMed Research Group, Taipei Medical University, Taipei, Taiwan
- Research Center for Artificial Intelligence in Medicine, Taipei Medical University, Taipei, Taiwan
- Translational Imaging Research Center, Taipei Medical University Hospital, Taipei, Taiwan
| |
Collapse
|
19
|
Zhang J, Yan W, Zhang Q, Li Z, Liang L, Zuo M, Zhang Y. Umami-BERT: An interpretable BERT-based model for umami peptides prediction. Food Res Int 2023; 172:113142. [PMID: 37689906 DOI: 10.1016/j.foodres.2023.113142] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 06/12/2023] [Accepted: 06/13/2023] [Indexed: 09/11/2023]
Abstract
Umami peptides have received extensive attention due to their ability to enhance flavors and provide nutritional benefits. The increasing demand for novel umami peptides and the vast number of peptides present in food call for more efficient methods to screen umami peptides, and further exploration is necessary. Therefore, the purpose of this study is to develop deep learning (DL) model to realize rapid screening of umami peptides. The Umami-BERT model was devised utilizing a novel two-stage training strategy with Bidirectional Encoder Representations from Transformers (BERT) and the inception network. In the pre-training stage, attention mechanisms were implemented on a large amount of bioactive peptides sequences to acquire high-dimensional generalized features. In the re-training stage, umami peptide prediction was carried out on UMP789 dataset, which is developed through the latest research. The model achieved the performance with an accuracy (ACC) of 93.23% and MCC of 0.78 on the balanced dataset, as well as an ACC of 95.00% and MCC of 0.85 on the unbalanced dataset. The results demonstrated that Umami-BERT could predict umami peptides directly from their amino acid sequences and exceeded the performance of other models. Furthermore, Umami-BERT enabled the analysis of attention pattern learned by Umami-BERT model. The amino acids Alanine (A), Cysteine (C), Aspartate (D), and Glutamicacid (E) were found to be the most significant contributors to umami peptides. Additionally, the patterns of summarized umami peptides involving A, C, D, and E were analyzed based on the learned attention weights. Consequently, Umami-BERT exhibited great potential in the large-scale screening of candidate peptides and offers novel insight for the further exploration of umami peptides.
Collapse
Affiliation(s)
- Jingcheng Zhang
- Food Laboratory of Zhongyuan, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China; Key Laboratory of Flavor Science of China Gengeral Chamber of Commerce, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Wenjing Yan
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Qingchuan Zhang
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Zihan Li
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Li Liang
- Food Laboratory of Zhongyuan, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China; Key Laboratory of Flavor Science of China Gengeral Chamber of Commerce, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Min Zuo
- National Engineering Research Centre for Agri-product Quality Traceability, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| | - Yuyu Zhang
- Food Laboratory of Zhongyuan, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China; Key Laboratory of Flavor Science of China Gengeral Chamber of Commerce, Beijing Technology and Business University, No. 11/33, Fucheng Road, Haidian District, Beijing 100048, China.
| |
Collapse
|
20
|
He J, Zhang S, Fang C. AAindex-PPII: Predicting polyproline type II helix structure based on amino acid indexes with an improved BiGRU-TextCNN model. J Bioinform Comput Biol 2023; 21:2350022. [PMID: 37899354 DOI: 10.1142/s0219720023500221] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2023]
Abstract
The polyproline-II (PPII) structure domain is crucial in organisms' signal transduction, transcription, cell metabolism, and immune response. It is also a critical structural domain for specific vital disease-associated proteins. Recognizing PPII is essential for understanding protein structure and function. To accurately predict PPII in proteins, we propose a novel method, AAindex-PPII, which only adopts amino acid index to characterize protein sequences and uses a Bidirectional Gated Recurrent Unit (BiGRU)-Improved TextCNN composite deep learning model to predict PPII in proteins. Experimental results show that, when tested on the same datasets, our method outperforms the state-of-the-art BERT-PPII method, achieving an AUC value of 0.845 on the strict data and an AUC value of 0.813 on the non-strict data, which is 0.024 and 0.03 higher than that of the BERT-PPII method. This study demonstrates that our proposed method is simple and efficient for PPII prediction without using pre-trained large models or complex features such as position-specific scoring matrices.
Collapse
Affiliation(s)
- Jiasheng He
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Shun Zhang
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| | - Chun Fang
- College of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China
| |
Collapse
|
21
|
Zhang X, Guo H, Zhang F, Wang X, Wu K, Qiu S, Liu B, Wang Y, Hu Y, Li J. HNetGO: protein function prediction via heterogeneous network transformer. Brief Bioinform 2023; 24:bbab556. [PMID: 37861172 PMCID: PMC10588005 DOI: 10.1093/bib/bbab556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 11/18/2021] [Accepted: 12/04/2021] [Indexed: 10/21/2023] Open
Abstract
Protein function annotation is one of the most important research topics for revealing the essence of life at molecular level in the post-genome era. Current research shows that integrating multisource data can effectively improve the performance of protein function prediction models. However, the heavy reliance on complex feature engineering and model integration methods limits the development of existing methods. Besides, models based on deep learning only use labeled data in a certain dataset to extract sequence features, thus ignoring a large amount of existing unlabeled sequence data. Here, we propose an end-to-end protein function annotation model named HNetGO, which innovatively uses heterogeneous network to integrate protein sequence similarity and protein-protein interaction network information and combines the pretraining model to extract the semantic features of the protein sequence. In addition, we design an attention-based graph neural network model, which can effectively extract node-level features from heterogeneous networks and predict protein function by measuring the similarity between protein nodes and gene ontology term nodes. Comparative experiments on the human dataset show that HNetGO achieves state-of-the-art performance on cellular component and molecular function branches.
Collapse
Affiliation(s)
- Xiaoshuai Zhang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Huannan Guo
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin 150086, China
| | - Fan Zhang
- Center NHC Key Laboratory of Cell Transplantation, The First Affiliated Hospital of Harbin Medical University, Harbin 150086, China
| | - Xuan Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Kaitao Wu
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| | - Shizheng Qiu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Bo Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Yang Hu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Junyi Li
- School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, Guangdong 518055, China
| |
Collapse
|
22
|
Rojas C, Ballabio D, Consonni V, Suárez-Estrella D, Todeschini R. Classification-based machine learning approaches to predict the taste of molecules: A review. Food Res Int 2023; 171:113036. [PMID: 37330849 DOI: 10.1016/j.foodres.2023.113036] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2023] [Revised: 05/02/2023] [Accepted: 05/22/2023] [Indexed: 06/19/2023]
Abstract
The capacity to discriminate safe from dangerous compounds has played an important role in the evolution of species, including human beings. Highly evolved senses such as taste receptors allow humans to navigate and survive in the environment through information that arrives to the brain through electrical pulses. Specifically, taste receptors provide multiple bits of information about the substances that are introduced orally. These substances could be pleasant or not according to the taste responses that they trigger. Tastes have been classified into basic (sweet, bitter, umami, sour and salty) or non-basic (astringent, chilling, cooling, heating, pungent), while some compounds are considered as multitastes, taste modifiers or tasteless. Classification-based machine learning approaches are useful tools to develop predictive mathematical relationships in such a way as to predict the taste class of new molecules based on their chemical structure. This work reviews the history of multicriteria quantitative structure-taste relationship modelling, starting from the first ligand-based (LB) classifier proposed in 1980 by Lemont B. Kier and concluding with the most recent studies published in 2022.
Collapse
Affiliation(s)
- Cristian Rojas
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador.
| | - Davide Ballabio
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Viviana Consonni
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| | - Diego Suárez-Estrella
- Grupo de Investigación en Quimiometría y QSAR, Facultad de Ciencia y Tecnología, Universidad del Azuay, Av. 24 de Mayo 7-77 y Hernán Malo, Cuenca 010107, Ecuador
| | - Roberto Todeschini
- Milano Chemometrics and QSAR Research Group, Department of Earth and Environmental Sciences, University of Milano-Bicocca, P.za della Scienza 1-20126, Milano, Italy
| |
Collapse
|
23
|
Zhai S, Tan Y, Zhang C, Hipolito CJ, Song L, Zhu C, Zhang Y, Duan H, Yin Y. PepScaf: Harnessing Machine Learning with In Vitro Selection toward De Novo Macrocyclic Peptides against IL-17C/IL-17RE Interaction. J Med Chem 2023; 66:11187-11200. [PMID: 37480587 DOI: 10.1021/acs.jmedchem.3c00627] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/24/2023]
Abstract
The combination of library-based screening and artificial intelligence (AI) has been accelerating the discovery and optimization of hit ligands. However, the potential of AI to assist in de novo macrocyclic peptide ligand discovery has yet to be fully explored. In this study, an integrated AI framework called PepScaf was developed to extract the critical scaffold relative to bioactivity based on a vast dataset from an initial in vitro selection campaign against a model protein target, interleukin-17C (IL-17C). Taking the generated scaffold, a focused macrocyclic peptide library was rationally constructed to target IL-17C, yielding over 20 potent peptides that effectively inhibited IL-17C/IL-17RE interaction. Notably, the top two peptides displayed exceptional potency with IC50 values of 1.4 nM. This approach presents a viable methodology for more efficient macrocyclic peptide discovery, offering potential time and cost savings. Additionally, this is also the first report regarding the discovery of macrocyclic peptides against IL-17C/IL-17RE interaction.
Collapse
Affiliation(s)
- Silong Zhai
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Yahong Tan
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Chengyun Zhang
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Christopher John Hipolito
- Screening & Compound Profiling, Quantitative Biosciences, Merck & Co., Inc., Kenilworth, New Jersey 07033, United States
| | - Lulu Song
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Cheng Zhu
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Youming Zhang
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
| | - Hongliang Duan
- School of Pharmaceutical Sciences, Zhejiang University of Technology, Hangzhou 310014, China
| | - Yizhen Yin
- State Key Laboratory of Microbial Technology, Institute of Microbial Technology, Shandong University, Qingdao 266237, China
- Shandong Research Institute of Industrial Technology, Jinan 250101, China
| |
Collapse
|
24
|
Charoenkwan P, Schaduangrat N, Shoombuatong W. StackTTCA: a stacking ensemble learning-based framework for accurate and high-throughput identification of tumor T cell antigens. BMC Bioinformatics 2023; 24:301. [PMID: 37507654 PMCID: PMC10386778 DOI: 10.1186/s12859-023-05421-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2023] [Accepted: 07/19/2023] [Indexed: 07/30/2023] Open
Abstract
BACKGROUND The identification of tumor T cell antigens (TTCAs) is crucial for providing insights into their functional mechanisms and utilizing their potential in anticancer vaccines development. In this context, TTCAs are highly promising. Meanwhile, experimental technologies for discovering and characterizing new TTCAs are expensive and time-consuming. Although many machine learning (ML)-based models have been proposed for identifying new TTCAs, there is still a need to develop a robust model that can achieve higher rates of accuracy and precision. RESULTS In this study, we propose a new stacking ensemble learning-based framework, termed StackTTCA, for accurate and large-scale identification of TTCAs. Firstly, we constructed 156 different baseline models by using 12 different feature encoding schemes and 13 popular ML algorithms. Secondly, these baseline models were trained and employed to create a new probabilistic feature vector. Finally, the optimal probabilistic feature vector was determined based the feature selection strategy and then used for the construction of our stacked model. Comparative benchmarking experiments indicated that StackTTCA clearly outperformed several ML classifiers and the existing methods in terms of the independent test, with an accuracy of 0.932 and Matthew's correlation coefficient of 0.866. CONCLUSIONS In summary, the proposed stacking ensemble learning-based framework of StackTTCA could help to precisely and rapidly identify true TTCAs for follow-up experimental verification. In addition, we developed an online web server ( http://2pmlab.camt.cmu.ac.th/StackTTCA ) to maximize user convenience for high-throughput screening of novel TTCAs.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
25
|
Li F, Liu S, Li K, Zhang Y, Duan M, Yao Z, Zhu G, Guo Y, Wang Y, Huang L, Zhou F. EpiTEAmDNA: Sequence feature representation via transfer learning and ensemble learning for identifying multiple DNA epigenetic modification types across species. Comput Biol Med 2023; 160:107030. [PMID: 37196456 DOI: 10.1016/j.compbiomed.2023.107030] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2023] [Revised: 04/21/2023] [Accepted: 05/10/2023] [Indexed: 05/19/2023]
Abstract
Methylation is a major DNA epigenetic modification for regulating the biological processes without altering the DNA sequence, and multiple types of DNA methylations have been discovered, including 6mA, 5hmC, and 4mC. Multiple computational approaches were developed to automatically identify the DNA methylation residues using machine learning or deep learning algorithms. The machine learning (ML) based methods are difficult to be transferred to the other predicting tasks of the DNA methylation sites using additional knowledge. Deep learning (DL) may facilitate the transfer learning of knowledge from similar tasks, but they are often ineffective on small datasets. This study proposes an integrated feature representation framework EpiTEAmDNA based on the strategies of transfer learning and ensemble learning, which is evaluated on multiple DNA methylation types across 15 species. EpiTEAmDNA integrates convolutional neural network (CNN) and conventional machine learning methods, and shows improved performances than the existing DL-based methods on small datasets when no additional knowledge is available. The experimental data suggests that the EpiTEAmDNA models may be further improved via transfer learning based on additional knowledge. The evaluation experiments on the independent test datasets also suggest that the proposed EpiTEAmDNA framework outperforms the existing models in most prediction tasks of the 3 DNA methylation types across 15 species. The source code, pre-trained global model, and the EpiTEAmDNA feature representation framework are freely available at http://www.healthinformaticslab.org/supp/.
Collapse
Affiliation(s)
- Fei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Shuai Liu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Kewei Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yaqi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Meiyu Duan
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| | - Zhaomin Yao
- College of Medicine and Biological Information Engineering, Northeastern University, Shenyang, Liaoning, 110167, China
| | - Gancheng Zhu
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Yutong Guo
- College of Life Sciences, Jilin University, Changchun, Jilin, 130012, China
| | - Ying Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Lan Huang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China
| | - Fengfeng Zhou
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun, Jilin, 130012, China; College of Computer Science and Technology, Jilin University, Changchun, Jilin, 130012, China.
| |
Collapse
|
26
|
Schaduangrat N, Anuwongcharoen N, Charoenkwan P, Shoombuatong W. DeepAR: a novel deep learning-based hybrid framework for the interpretable prediction of androgen receptor antagonists. J Cheminform 2023; 15:50. [PMID: 37149650 PMCID: PMC10163717 DOI: 10.1186/s13321-023-00721-z] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2022] [Accepted: 04/08/2023] [Indexed: 05/08/2023] Open
Abstract
Drug resistance represents a major obstacle to therapeutic innovations and is a prevalent feature in prostate cancer (PCa). Androgen receptors (ARs) are the hallmark therapeutic target for prostate cancer modulation and AR antagonists have achieved great success. However, rapid emergence of resistance contributing to PCa progression is the ultimate burden of their long-term usage. Hence, the discovery and development of AR antagonists with capability to combat the resistance, remains an avenue for further exploration. Therefore, this study proposes a novel deep learning (DL)-based hybrid framework, named DeepAR, to accurately and rapidly identify AR antagonists by using only the SMILES notation. Specifically, DeepAR is capable of extracting and learning the key information embedded in AR antagonists. Firstly, we established a benchmark dataset by collecting active and inactive compounds against AR from the ChEMBL database. Based on this dataset, we developed and optimized a collection of baseline models by using a comprehensive set of well-known molecular descriptors and machine learning algorithms. Then, these baseline models were utilized for creating probabilistic features. Finally, these probabilistic features were combined and used for the construction of a meta-model based on a one-dimensional convolutional neural network. Experimental results indicated that DeepAR is a more accurate and stable approach for identifying AR antagonists in terms of the independent test dataset, by achieving an accuracy of 0.911 and MCC of 0.823. In addition, our proposed framework is able to provide feature importance information by leveraging a popular computational approach, named SHapley Additive exPlanations (SHAP). In the meanwhile, the characterization and analysis of potential AR antagonist candidates were achieved through the SHAP waterfall plot and molecular docking. The analysis inferred that N-heterocyclic moieties, halogenated substituents, and a cyano functional group were significant determinants of potential AR antagonists. Lastly, we implemented an online web server by using DeepAR (at http://pmlabstack.pythonanywhere.com/DeepAR ). We anticipate that DeepAR could be a useful computational tool for community-wide facilitation of AR candidates from a large number of uncharacterized compounds.
Collapse
Affiliation(s)
- Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Nuttapat Anuwongcharoen
- Department of Community Medical Technology, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai, 50200, Thailand.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok, 10700, Thailand.
| |
Collapse
|
27
|
Thafar MA, Albaradei S, Uludag M, Alshahrani M, Gojobori T, Essack M, Gao X. OncoRTT: Predicting novel oncology-related therapeutic targets using BERT embeddings and omics features. Front Genet 2023; 14:1139626. [PMID: 37091791 PMCID: PMC10117673 DOI: 10.3389/fgene.2023.1139626] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2023] [Accepted: 03/24/2023] [Indexed: 04/08/2023] Open
Abstract
Late-stage drug development failures are usually a consequence of ineffective targets. Thus, proper target identification is needed, which may be possible using computational approaches. The reason being, effective targets have disease-relevant biological functions, and omics data unveil the proteins involved in these functions. Also, properties that favor the existence of binding between drug and target are deducible from the protein’s amino acid sequence. In this work, we developed OncoRTT, a deep learning (DL)-based method for predicting novel therapeutic targets. OncoRTT is designed to reduce suboptimal target selection by identifying novel targets based on features of known effective targets using DL approaches. First, we created the “OncologyTT” datasets, which include genes/proteins associated with ten prevalent cancer types. Then, we generated three sets of features for all genes: omics features, the proteins’ amino-acid sequence BERT embeddings, and the integrated features to train and test the DL classifiers separately. The models achieved high prediction performances in terms of area under the curve (AUC), i.e., AUC greater than 0.88 for all cancer types, with a maximum of 0.95 for leukemia. Also, OncoRTT outperformed the state-of-the-art method using their data in five out of seven cancer types commonly assessed by both methods. Furthermore, OncoRTT predicts novel therapeutic targets using new test data related to the seven cancer types. We further corroborated these results with other validation evidence using the Open Targets Platform and a case study focused on the top-10 predicted therapeutic targets for lung cancer.
Collapse
Affiliation(s)
- Maha A. Thafar
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- College of Computers and Information Technology, Computer Science Department, Taif University, Taif, Saudi Arabia
| | - Somayah Albaradei
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Mahmut Uludag
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Mona Alshahrani
- National Center for Artificial Intelligence (NCAI), Saudi Data and Artificial Intelligence Authority (SDAIA), Riyadh, Saudi Arabia
| | - Takashi Gojobori
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Magbubah Essack
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| | - Xin Gao
- Computer, Electrical and Mathematical Sciences and Engineering Division (CEMSE), Computational Bioscience Research Center, Computer (CBRC), King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- *Correspondence: Xin Gao, ; Magbubah Essack,
| |
Collapse
|
28
|
Jiang J, Li J, Li J, Pei H, Li M, Zou Q, Lv Z. A Machine Learning Method to Identify Umami Peptide Sequences by Using Multiplicative LSTM Embedded Features. Foods 2023; 12:foods12071498. [PMID: 37048319 PMCID: PMC10094688 DOI: 10.3390/foods12071498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2023] [Revised: 03/24/2023] [Accepted: 03/30/2023] [Indexed: 04/05/2023] Open
Abstract
Umami peptides enhance the umami taste of food and have good food processing properties, nutritional value, and numerous potential applications. Wet testing for the identification of umami peptides is a time-consuming and expensive process. Here, we report the iUmami-DRLF that uses a logistic regression (LR) method solely based on the deep learning pre-trained neural network feature extraction method, unified representation (UniRep based on multiplicative LSTM), for feature extraction from the peptide sequences. The findings demonstrate that deep learning representation learning significantly enhanced the capability of models in identifying umami peptides and predictive precision solely based on peptide sequence information. The newly validated taste sequences were also used to test the iUmami-DRLF and other predictors, and the result indicates that the iUmami-DRLF has better robustness and accuracy and remains valid at higher probability thresholds. The iUmami-DRLF method can aid further studies on enhancing the umami flavor of food for satisfying the need for an umami-flavored diet.
Collapse
Affiliation(s)
- Jici Jiang
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Jiayu Li
- College of Life Science, Sichuan University, Chengdu 610065, China
| | - Junxian Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Hongdi Pei
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
- Wu Yuzhang Honors College, Sichuan University, Chengdu 610065, China
| | - Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324000, China
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
29
|
Fan Y, Sun G, Pan X. ELMo4m6A: A Contextual Language Embedding-Based Predictor for Detecting RNA N6-Methyladenosine Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:944-954. [PMID: 35536814 DOI: 10.1109/tcbb.2022.3173323] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
N6-methyladenosine (m6A) is a universal post-transcriptional modification of RNAs, and it is widely involved in various biological processes. Identifying m6A modification sites accurately is indispensable to further investigate m6A-mediated biological functions. How to better represent RNA sequences is crucial for building effective computational methods for detecting m6A modification sites. However, traditional encoding methods require complex biological prior knowledge and are time-consuming. Furthermore, most of the existing m6A sites prediction methods are limited to single species, and few methods are able to predict m6A sites across different species and tissues. Thus, it is necessary to design a more efficient computational method to predict m6A sites across multiple species and tissues. In this paper, we proposed ELMo4m6A, a contextual language embedding-based method for predicting m6A sites from RNA sequences without any prior knowledge. ELMo4m6A first learns embeddings of RNA sequences using a language model ELMo, then uses a hybrid convolutional neural network (CNN) and long short-term memory (LSTM) to identify m6A sites. The results of 5-fold cross-validation and independent testing demonstrate that ELMo4m6A is superior to state-of-the-art methods. Moreover, we applied integrated gradients to find potential sequence patterns contributing to m6A sites.
Collapse
|
30
|
Charoenkwan P, Schaduangrat N, Pham NT, Manavalan B, Shoombuatong W. Pretoria: An effective computational approach for accurate and high-throughput identification of CD8+ t-cell epitopes of eukaryotic pathogens. Int J Biol Macromol 2023; 238:124228. [PMID: 36996953 DOI: 10.1016/j.ijbiomac.2023.124228] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 03/11/2023] [Accepted: 03/25/2023] [Indexed: 03/31/2023]
Abstract
T-cells recognize antigenic epitopes present on major histocompatibility complex (MHC) molecules, triggering an adaptive immune response in the host. T-cell epitope (TCE) identification is challenging because of the extensive number of undetermined proteins found in eukaryotic pathogens, as well as MHC polymorphisms. In addition, conventional experimental approaches for TCE identification are time-consuming and expensive. Thus, computational approaches that can accurately and rapidly identify CD8+ T-cell epitopes (TCEs) of eukaryotic pathogens based solely on sequence information may facilitate the discovery of novel CD8+ TCEs in a cost-effective manner. Here, Pretoria (Predictor of CD8+ TCEs of eukaryotic pathogens) is proposed as the first stack-based approach for accurate and large-scale identification of CD8+ TCEs of eukaryotic pathogens. In particular, Pretoria enabled the extraction and exploration of crucial information embedded in CD8+ TCEs by employing a comprehensive set of 12 well-known feature descriptors extracted from multiple groups, including physicochemical properties, composition-transition-distribution, pseudo-amino acid composition, and amino acid composition. These feature descriptors were then utilized to construct a pool of 144 different machine learning (ML)-based classifiers based on 12 popular ML algorithms. Finally, the feature selection method was used to effectively determine the important ML classifiers for the construction of our stacked model. The experimental results indicated that Pretoria is an accurate and effective computational approach for CD8+ TCE prediction; it was superior to several conventional ML classifiers and the existing method in terms of the independent test, with an accuracy of 0.866, MCC of 0.732, and AUC of 0.921. Additionally, to maximize user convenience for high-throughput identification of CD8+ TCEs of eukaryotic pathogens, a user-friendly web server of Pretoria (http://pmlabstack.pythonanywhere.com/Pretoria) was developed and made freely available.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Nalini Schaduangrat
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Nhat Truong Pham
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea
| | - Balachandran Manavalan
- Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
| | - Watshara Shoombuatong
- Center for Research Innovation and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand.
| |
Collapse
|
31
|
Deep learning drives efficient discovery of novel antihypertensive peptides from soybean protein isolate. Food Chem 2023; 404:134690. [DOI: 10.1016/j.foodchem.2022.134690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Revised: 09/29/2022] [Accepted: 10/17/2022] [Indexed: 11/06/2022]
|
32
|
Zhang YF, Wang YH, Gu ZF, Pan XR, Li J, Ding H, Zhang Y, Deng KJ. Bitter-RF: A random forest machine model for recognizing bitter peptides. Front Med (Lausanne) 2023; 10:1052923. [PMID: 36778738 PMCID: PMC9909039 DOI: 10.3389/fmed.2023.1052923] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2022] [Accepted: 01/05/2023] [Indexed: 01/27/2023] Open
Abstract
Introduction Bitter peptides are short peptides with potential medical applications. The huge potential behind its bitter taste remains to be tapped. To better explore the value of bitter peptides in practice, we need a more effective classification method for identifying bitter peptides. Methods In this study, we developed a Random forest (RF)-based model, called Bitter-RF, using sequence information of the bitter peptide. Bitter-RF covers more comprehensive and extensive information by integrating 10 features extracted from the bitter peptides and achieves better results than the latest generation model on independent validation set. Results The proposed model can improve the accurate classification of bitter peptides (AUROC = 0.98 on independent set test) and enrich the practical application of RF method in protein classification tasks which has not been used to build a prediction model for bitter peptides. Discussion We hope the Bitter-RF could provide more conveniences to scholars for bitter peptide research.
Collapse
Affiliation(s)
- Yu-Fei Zhang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Yu-Hao Wang
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Zhi-Feng Gu
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| | - Xian-Run Pan
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China
| | - Jian Li
- School of Basic Medical Sciences, Chengdu University, Chengdu, China
| | - Hui Ding
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,*Correspondence: Hui Ding,
| | - Yang Zhang
- Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China,Yang Zhang,
| | - Ke-Jun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China,Ke-Jun Deng,
| |
Collapse
|
33
|
Yan J, Tong H. An overview of bitter compounds in foodstuffs: Classifications, evaluation methods for sensory contribution, separation and identification techniques, and mechanism of bitter taste transduction. Compr Rev Food Sci Food Saf 2023; 22:187-232. [PMID: 36382875 DOI: 10.1111/1541-4337.13067] [Citation(s) in RCA: 12] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Revised: 08/24/2022] [Accepted: 10/11/2022] [Indexed: 11/17/2022]
Abstract
The bitter taste is generally considered an undesirable sensory attribute. However, bitter-tasting compounds can significantly affect the overall flavor of many foods and beverages and endow them with various beneficial effects on human health. To better understand the relationship between chemical structure and bitterness, this paper has summarized the bitter compounds in foodstuffs and classified them based on the basic skeletons. Only those bitter compounds that are confirmed by human sensory evaluation have been included in this paper. To develop food products that satisfy consumer preferences, correctly ranking the key bitter compounds in foodstuffs according to their contributions to the overall bitterness intensity is the precondition. Generally, three methods were applied to screen out the key bitter compounds in foods and beverages and evaluate their sensory contributions, including dose-over-threshold factors, taste dilution analysis, and spectrum descriptive analysis method. This paper has discussed in detail the mechanisms and applications of these three methods. Typical procedures for separating and identifying the main bitter compounds in foodstuffs have also been summarized. Additionally, the activation of human bitter taste receptors (TAS2Rs) and the mechanisms of bitter taste transduction are outlined. Ultimately, a conclusion has been drawn to highlight the current problems and propose potential directions for further research.
Collapse
Affiliation(s)
- Jingna Yan
- College of Food Science, Southwest University, Chongqing, China
| | - Huarong Tong
- College of Food Science, Southwest University, Chongqing, China
| |
Collapse
|
34
|
Gao W, Xu D, Li H, Du J, Wang G, Li D. Identification of adaptor proteins by incorporating deep learning and PSSM profiles. Methods 2023; 209:10-17. [PMID: 36427763 DOI: 10.1016/j.ymeth.2022.11.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2022] [Revised: 10/25/2022] [Accepted: 11/02/2022] [Indexed: 11/23/2022] Open
Abstract
Adaptor proteins, also known as signal transduction adaptor proteins, are important proteins in signal transduction pathways, and play a role in connecting signal proteins for signal transduction between cells. Studies have shown that adaptor proteins are closely related to some diseases, such as tumors and diabetes. Therefore, it is very meaningful to construct a relevant model to accurately identify adaptor proteins. In recent years, many studies have used a position-specific scoring matrix (PSSM) and neural network methods to identify adaptor proteins. However, ordinary neural network models cannot correlate the contextual information in PSSM profiles well, so these studies usually process 20×N (N > 20) PSSM into 20×20 dimensions, which results in the loss of a large amount of protein information; This research proposes an efficient method that combines one-dimensional convolution (1-D CNN) and a bidirectional long short-term memory network (biLSTM) to identify adaptor proteins. The complete PSSM profiles are the input of the model, and the complete information of the protein is retained during the training process. We perform cross-validation during model training and test the performance of the model on an independent test set; in the data set with 1224 adaptor proteins and 11,078 non-adaptor proteins, five indicators including specificity, sensitivity, accuracy, area under the receiver operating characteristic curve (AUC) metric and Matthews correlation coefficient (MCC), were employed to evaluate model performance. On the independent test set, the specificity, sensitivity, accuracy and MCC were 0.817, 0.865, 0.823 and 0.465, respectively. Those results show that our method is better than the state-of-the art methods. This study is committed to improve the accuracy of adaptor protein identification, and laid a foundation for further research on diseases related to adaptor protein. This research provided a new idea for the application of deep learning related models in bioinformatics and computational biology.
Collapse
Affiliation(s)
- Wentao Gao
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China
| | - Dali Xu
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China
| | - Hongfei Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China
| | - Junping Du
- Beijing Key Laboratory of Intelligent Telecommunication Software and Multimedia, School of Computer Science, Beijing University of Posts and Telecommunications, Beijing, 100876, China
| | - Guohua Wang
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China.
| | - Dan Li
- College of Information and Computer Engineering, Northeast Forestry University, Harbin 150000, China.
| |
Collapse
|
35
|
PSRTTCA: A new approach for improving the prediction and characterization of tumor T cell antigens using propensity score representation learning. Comput Biol Med 2023; 152:106368. [PMID: 36481763 DOI: 10.1016/j.compbiomed.2022.106368] [Citation(s) in RCA: 5] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2022] [Revised: 10/19/2022] [Accepted: 11/25/2022] [Indexed: 11/27/2022]
Abstract
Despite the arsenal of existing cancer therapies, the ongoing recurrence and new cases of cancer pose a serious health concern that necessitates the development of new and effective treatments. Cancer immunotherapy, which uses the body's immune system to combat cancer, is a promising treatment option. As a result, in silico methods for identifying and characterizing tumor T cell antigens (TTCAs) would be useful for better understanding their functional mechanisms. Although few computational methods for TTCA identification have been developed, their lack of model interpretability is a major drawback. Thus, developing computational methods for the effective identification and characterization of TTCAs is a critical endeavor. PSRTTCA, a new machine learning (ML)-based approach for improving the identification and characterization of TTCAs based on their primary sequences, is proposed in this study. Specifically, we introduce a new propensity score representation learning algorithm that allows one to generate various sets of propensity scores of amino acids, dipeptides, and g-gap dipeptides to be TTCAs. To enhance the predictive performance, optimal sets of variant propensity scores were determined and fed into the final meta-predictor (PSRTTCA). Benchmarking results revealed that PSRTTCA was a more precise and promising tool for the identification and characterization of TTCAs than conventional ML classifiers and existing methods. Furthermore, PSR-derived propensities of amino acids in becoming TTCAs are used to reveal the relationship between TTCAs and their informative physicochemical properties in order to provide insights into TTCA characteristics. Finally, a user-friendly online computational platform of PSRTTCA is publicly available at http://pmlabstack.pythonanywhere.com/PSRTTCA. The PSRTTCA predictor is anticipated to facilitate community-wide efforts in accelerating the discovery of novel TTCAs for cancer immunotherapy and other clinical applications.
Collapse
|
36
|
Naravane T, Tagkopoulos I. Machine learning models to predict micronutrient profile in food after processing. Curr Res Food Sci 2023; 6:100500. [PMID: 37151381 PMCID: PMC10160345 DOI: 10.1016/j.crfs.2023.100500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2022] [Revised: 02/28/2023] [Accepted: 04/02/2023] [Indexed: 05/09/2023] Open
Abstract
The information on nutritional profile of cooked foods is important to both food manufacturers and consumers, and a major challenge to obtaining precise information is the inherent variation in composition across biological samples of any given raw ingredient. The ideal solution would address precision and generability, but the current solutions are limited in their capabilities; analytical methods are too costly to scale, retention-factor based methods are scalable but approximate, and kinetic models are bespoke to a food and nutrient. We provide an alternate solution that predicts the micronutrient profile in cooked food from the raw food composition, and for multiple foods. The prediction model is trained on an existing food composition dataset and has a 31% lower error on average (across all foods, processes and nutrients) than predictions obtained using the baseline method of retention-factors. Our results argue that data scaling and transformation prior to training the models is important to mitigate any yield bias. This study demonstrates the potential of machine learning methods over current solutions, and additionally provides guidance for the future generation of food composition data, specifically for sampling approach, data quality checks, and data representation standards.
Collapse
Affiliation(s)
- Tarini Naravane
- Biological Systems Engineering, University of California at Davis, United States
- Genome Center, University of California at Davis, United States
| | - Ilias Tagkopoulos
- Department of Computer Science, University of California at Davis, United States
- Genome Center, University of California at Davis, United States
- Corresponding author. Department of Computer Science, University of California at Davis, United States.
| |
Collapse
|
37
|
Pallante L, Korfiati A, Androutsos L, Stojceski F, Bompotas A, Giannikos I, Raftopoulos C, Malavolta M, Grasso G, Mavroudi S, Kalogeras A, Martos V, Amoroso D, Piga D, Theofilatos K, Deriu MA. Toward a general and interpretable umami taste predictor using a multi-objective machine learning approach. Sci Rep 2022; 12:21735. [PMID: 36526644 PMCID: PMC9758219 DOI: 10.1038/s41598-022-25935-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Accepted: 12/07/2022] [Indexed: 12/23/2022] Open
Abstract
The umami taste is one of the five basic taste modalities normally linked to the protein content in food. The implementation of fast and cost-effective tools for the prediction of the umami taste of a molecule remains extremely interesting to understand the molecular basis of this taste and to effectively rationalise the production and consumption of specific foods and ingredients. However, the only examples of umami predictors available in the literature rely on the amino acid sequence of the analysed peptides, limiting the applicability of the models. In the present study, we developed a novel ML-based algorithm, named VirtuousUmami, able to predict the umami taste of a query compound starting from its SMILES representation, thus opening up the possibility of potentially using such a model on any database through a standard and more general molecular description. Herein, we have tested our model on five databases related to foods or natural compounds. The proposed tool will pave the way toward the rationalisation of the molecular features underlying the umami taste and toward the design of specific peptide-inspired compounds with specific taste properties.
Collapse
Affiliation(s)
- Lorenzo Pallante
- grid.4800.c0000 0004 1937 0343Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129 Torino, Italy
| | | | | | - Filip Stojceski
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, 6962 Lugano-Viganello, Switzerland
| | - Agorakis Bompotas
- grid.435019.a0000 0004 0394 1287Industrial Systems Institute, Athena Research Center, 265 04 Patras, Greece
| | - Ioannis Giannikos
- grid.435019.a0000 0004 0394 1287Industrial Systems Institute, Athena Research Center, 265 04 Patras, Greece
| | - Christos Raftopoulos
- grid.435019.a0000 0004 0394 1287Industrial Systems Institute, Athena Research Center, 265 04 Patras, Greece
| | - Marta Malavolta
- grid.8954.00000 0001 0721 6013Faculty of Computer and Information Science, University of Ljubljana, 1000 Ljubljana, Slovenia
| | - Gianvito Grasso
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, 6962 Lugano-Viganello, Switzerland
| | - Seferina Mavroudi
- InSyBio PC, 265 04 Patras, Greece ,grid.11047.330000 0004 0576 5395Department of Nursing, University of Patras, 265 04 Patras, Greece
| | - Athanasios Kalogeras
- grid.435019.a0000 0004 0394 1287Industrial Systems Institute, Athena Research Center, 265 04 Patras, Greece
| | - Vanessa Martos
- grid.4489.10000000121678994Department of Plant Physiology, Institute of Biotechnology, University of Granada, 18011 Granada, Spain
| | | | - Dario Piga
- Department of Innovative Technologies, Dalle Molle Institute for Artificial Intelligence, 6962 Lugano-Viganello, Switzerland
| | | | - Marco A. Deriu
- grid.4800.c0000 0004 1937 0343Department of Mechanical and Aerospace Engineering, Politecnico di Torino, PolitoBIOMedLab, 10129 Torino, Italy
| |
Collapse
|
38
|
IUP-BERT: Identification of Umami Peptides Based on BERT Features. Foods 2022; 11:foods11223742. [PMID: 36429332 PMCID: PMC9689418 DOI: 10.3390/foods11223742] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2022] [Revised: 11/14/2022] [Accepted: 11/16/2022] [Indexed: 11/23/2022] Open
Abstract
Umami is an important widely-used taste component of food seasoning. Umami peptides are specific structural peptides endowing foods with a favorable umami taste. Laboratory approaches used to identify umami peptides are time-consuming and labor-intensive, which are not feasible for rapid screening. Here, we developed a novel peptide sequence-based umami peptide predictor, namely iUP-BERT, which was based on the deep learning pretrained neural network feature extraction method. After optimization, a single deep representation learning feature encoding method (BERT: bidirectional encoder representations from transformer) in conjugation with the synthetic minority over-sampling technique (SMOTE) and support vector machine (SVM) methods was adopted for model creation to generate predicted probabilistic scores of potential umami peptides. Further extensive empirical experiments on cross-validation and an independent test showed that iUP-BERT outperformed the existing methods with improvements, highlighting its effectiveness and robustness. Finally, an open-access iUP-BERT web server was built. To our knowledge, this is the first efficient sequence-based umami predictor created based on a single deep-learning pretrained neural network feature extraction method. By predicting umami peptides, iUP-BERT can help in further research to improve the palatability of dietary supplements in the future.
Collapse
|
39
|
Charoenkwan P, Schaduangrat N, Lio P, Moni MA, Chumnanpuen P, Shoombuatong W. iAMAP-SCM: A Novel Computational Tool for Large-Scale Identification of Antimalarial Peptides Using Estimated Propensity Scores of Dipeptides. ACS OMEGA 2022; 7:41082-41095. [PMID: 36406571 PMCID: PMC9670693 DOI: 10.1021/acsomega.2c04465] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 10/20/2022] [Indexed: 06/16/2023]
Abstract
Antimalarial peptides (AMAPs) varying in length, amino acid composition, charge, conformational structure, hydrophobicity, and amphipathicity reflect their diversity in antimalarial mechanisms. Due to the worldwide major health problem concerning antimicrobial resistance, these peptides possess great therapeutic value owing to their low incidences of drug resistance as compared to conventional antibiotics. Although well-known experimental methods are able to precisely determine the antimalarial activity of peptides, these methods are still time-consuming and costly. Thus, machine learning (ML)-based methods that are capable of identifying AMAPs rapidly by using only sequence information would be beneficial for the high-throughput identification of AMAPs. In this study, we propose the first computational model (termed iAMAP-SCM) for the large-scale identification and characterization of peptides with antimalarial activity by using only sequence information. Specifically, we employed an interpretable scoring card method (SCM) to develop iAMAP-SCM and estimate propensities of 20 amino acids and 400 dipeptides to be AMAPs in a supervised manner. Experimental results showed that iAMAP-SCM could achieve a maximum accuracy and Matthew's coefficient correlation of 0.957 and 0.834, respectively, on the independent test dataset. In addition, SCM-derived propensities of 20 amino acids and selected physicochemical properties were used to provide an understanding of the functional mechanisms of AMAPs. Finally, a user-friendly online computational platform of iAMAP-SCM is publicly available at http://pmlabstack.pythonanywhere.com/iAMAP-SCM. The iAMAP-SCM predictor is anticipated to assist experimental scientists in the high-throughput identification of potential AMAP candidates for the treatment of malaria and other clinical applications.
Collapse
Affiliation(s)
- Phasit Charoenkwan
- Modern
Management and Information Technology, College of Arts, Media and
Technology, Chiang Mai University, Chiang Mai50200, Thailand
| | - Nalini Schaduangrat
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok10700, Thailand
| | - Pietro Lio
- Department
of Computer Science and Technology, University
of Cambridge, CambridgeshireCB3 0FD, U.K.
| | - Mohammad Ali Moni
- Artificial
Intelligence & Digital Health, School of Health and Rehabilitation
Sciences, Faculty of Health and Behavioural Sciences, The University of Queensland, St LuciaQLD 4072, Australia
| | - Pramote Chumnanpuen
- Department
of Zoology, Faculty of Science, Kasetsart
University, Bangkok10900, Thailand
- Omics Center
for Agriculture, Bioresources, Food, and Health, Kasetsart University (OmiKU), Bangkok10900, Thailand
| | - Watshara Shoombuatong
- Center
of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok10700, Thailand
| |
Collapse
|
40
|
A TastePeptides-Meta system including an umami/bitter classification model Umami_YYDS, a TastePeptidesDB database and an open-source package Auto_Taste_ML. Food Chem 2022; 405:134812. [DOI: 10.1016/j.foodchem.2022.134812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2022] [Revised: 10/25/2022] [Accepted: 10/28/2022] [Indexed: 11/11/2022]
|
41
|
Improved prediction and characterization of blood-brain barrier penetrating peptides using estimated propensity scores of dipeptides. J Comput Aided Mol Des 2022; 36:781-796. [DOI: 10.1007/s10822-022-00476-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2022] [Accepted: 09/15/2022] [Indexed: 11/27/2022]
|
42
|
Phirom K, Charoenkwan P, Shoombuatong W, Charoenkwan P, Sirichotiyakul S, Tongsong T. DeepThal: A Deep Learning-Based Framework for the Large-Scale Prediction of the α +-Thalassemia Trait Using Red Blood Cell Parameters. J Clin Med 2022; 11:jcm11216305. [PMID: 36362531 PMCID: PMC9654007 DOI: 10.3390/jcm11216305] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/29/2022] [Revised: 10/16/2022] [Accepted: 10/23/2022] [Indexed: 11/16/2022] Open
Abstract
Objectives: To develop a machine learning (ML)-based framework using red blood cell (RBC) parameters for the prediction of the α+-thalassemia trait (α+-thal trait) and to compare the diagnostic performance with a conventional method using a single RBC parameter or a combination of RBC parameters. Methods: A retrospective study was conducted on possible couples at risk for fetus with hemoglobin H (Hb H disease). Subjects with molecularly confirmed normal status (not thalassemia), α+-thal trait, and two-allele α-thalassemia mutation were included. Clinical parameters (age and gender) and RBC parameters (Hb, Hct, MCV, MCH, MCHC, RDW, and RBC count) obtained from their antenatal thalassemia screen were retrieved and analyzed using a machine learning (ML)-based framework and a conventional method. The performance of α+-thal trait prediction was evaluated. Results: In total, 594 cases (female/male: 330/264, mean age: 29.7 ± 6.6 years) were included in the analysis. There were 229 normal controls, 160 cases with the α+-thalassemia trait, and 205 cases in the two-allele α-thalassemia mutation category, respectively. The ML-derived model improved the diagnostic performance, giving a sensitivity of 80% and specificity of 81%. The experimental results indicated that DeepThal achieved a better performance compared with other ML-based methods in terms of the independent test dataset, with an accuracy of 80.77%, sensitivity of 70.59%, and the Matthews correlation coefficient (MCC) of 0.608. Of all the red blood cell parameters, MCH < 28.95 pg as a single parameter had the highest performance in predicting the α+-thal trait with the AUC of 0.857 and 95% CI of 0.816−0.899. The combination model derived from the binary logistic regression analysis exhibited improved performance with the AUC of 0.868 and 95% CI of 0.830−0.906, giving a sensitivity of 80.1% and specificity of 75.1%. Conclusions: The performance of DeepThal in terms of the independent test dataset is sufficient to demonstrate that DeepThal is capable of accurately predicting the α+-thal trait. It is anticipated that DeepThal will be a useful tool for the scientific community in the large-scale prediction of the α+-thal trait.
Collapse
Affiliation(s)
- Krittaya Phirom
- Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Phasit Charoenkwan
- Modern Management and Information Technology, College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Watshara Shoombuatong
- Center of Data Mining and Biomedical Informatics, Faculty of Medical Technology, Mahidol University, Bangkok 10700, Thailand
| | - Pimlak Charoenkwan
- Department of Pediatrics, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
- Thalassemia and Hematology Center, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
| | - Supatra Sirichotiyakul
- Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
- Thalassemia and Hematology Center, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
- Correspondence: (S.S.); (T.T.)
| | - Theera Tongsong
- Department of Obstetrics and Gynecology, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
- Thalassemia and Hematology Center, Faculty of Medicine, Chiang Mai University, Chiang Mai 50200, Thailand
- Correspondence: (S.S.); (T.T.)
| |
Collapse
|
43
|
PD-BertEDL: An Ensemble Deep Learning Method Using BERT and Multivariate Representation to Predict Peptide Detectability. Int J Mol Sci 2022; 23:ijms232012385. [PMID: 36293242 PMCID: PMC9604182 DOI: 10.3390/ijms232012385] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Revised: 10/11/2022] [Accepted: 10/12/2022] [Indexed: 12/03/2022] Open
Abstract
Peptide detectability is defined as the probability of identifying a peptide from a mixture of standard samples, which is a key step in protein identification and analysis. Exploring effective methods for predicting peptide detectability is helpful for disease treatment and clinical research. However, most existing computational methods for predicting peptide detectability rely on a single information. With the increasing complexity of feature representation, it is necessary to explore the influence of multivariate information on peptide detectability. Thus, we propose an ensemble deep learning method, PD-BertEDL. Bidirectional encoder representations from transformers (BERT) is introduced to capture the context information of peptides. Context information, sequence information, and physicochemical information of peptides were combined to construct the multivariate feature space of peptides. We use different deep learning methods to capture the high-quality features of different categories of peptides information and use the average fusion strategy to integrate three model prediction results to solve the heterogeneity problem and to enhance the robustness and adaptability of the model. The experimental results show that PD-BertEDL is superior to the existing prediction methods, which can effectively predict peptide detectability and provide strong support for protein identification and quantitative analysis, as well as disease treatment.
Collapse
|
44
|
Cross-attention PHV: Prediction of human and virus protein-protein interactions using cross-attention-based neural networks. Comput Struct Biotechnol J 2022; 20:5564-5573. [PMID: 36249566 PMCID: PMC9546503 DOI: 10.1016/j.csbj.2022.10.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2022] [Revised: 10/05/2022] [Accepted: 10/05/2022] [Indexed: 11/30/2022] Open
Abstract
Cross-attention PHV implements two key technologies: cross-attention mechanism and 1D-CNN. It accurately predicts PPIs between human and unknown influenza viruses/SARS-CoV-2. It extracts critical taxonomic and evolutionary differences responsible for PPI prediction.
Viral infections represent a major health concern worldwide. The alarming rate at which SARS-CoV-2 spreads, for example, led to a worldwide pandemic. Viruses incorporate genetic material into the host genome to hijack host cell functions such as the cell cycle and apoptosis. In these viral processes, protein–protein interactions (PPIs) play critical roles. Therefore, the identification of PPIs between humans and viruses is crucial for understanding the infection mechanism and host immune responses to viral infections and for discovering effective drugs. Experimental methods including mass spectrometry-based proteomics and yeast two-hybrid assays are widely used to identify human-virus PPIs, but these experimental methods are time-consuming, expensive, and laborious. To overcome this problem, we developed a novel computational predictor, named cross-attention PHV, by implementing two key technologies of the cross-attention mechanism and a one-dimensional convolutional neural network (1D-CNN). The cross-attention mechanisms were very effective in enhancing prediction and generalization abilities. Application of 1D-CNN to the word2vec-generated feature matrices reduced computational costs, thus extending the allowable length of protein sequences to 9000 amino acid residues. Cross-attention PHV outperformed existing state-of-the-art models using a benchmark dataset and accurately predicted PPIs for unknown viruses. Cross-attention PHV also predicted human–SARS-CoV-2 PPIs with area under the curve values >0.95. The Cross-attention PHV web server and source codes are freely available at https://kurata35.bio.kyutech.ac.jp/Cross-attention_PHV/ and https://github.com/kuratahiroyuki/Cross-Attention_PHV, respectively.
Collapse
Key Words
- 1D-CNN, One-dimensional-CNN
- AC, Accuracy
- AUC, Area under the curve
- CNN, Convolutional neural network
- Convolutional neural network
- DT, Decision tree
- F1, F1-score
- HV-PPIs, Human-virus PPIs
- HuV-PPI, Human–unknown virus PPI
- Human
- LR, Linear regression
- MCC, Matthews correlation coefficient
- PPIs, Protein-protein interactions
- Protein–protein interaction
- RF, Random forest
- SARS-CoV-2
- SARS-CoV-2, Severe acute respiratory syndrome coronavirus 2
- SN, Sensitivity
- SP, Specificity
- SVM, Support vector machine
- T-SNE, T-distributed stochastic neighbor embedding
- Virus
- W2V, Word2vec
- Word2vec
Collapse
|
45
|
Richter P, Sebald K, Fischer K, Behrens M, Schnieke A, Somoza V. Bitter Peptides YFYPEL, VAPFPEVF, and YQEPVLGPVRGPFPIIV, Released during Gastric Digestion of Casein, Stimulate Mechanisms of Gastric Acid Secretion via Bitter Taste Receptors TAS2R16 and TAS2R38. JOURNAL OF AGRICULTURAL AND FOOD CHEMISTRY 2022; 70:11591-11602. [PMID: 36054030 PMCID: PMC9501810 DOI: 10.1021/acs.jafc.2c05228] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 08/13/2022] [Accepted: 08/15/2022] [Indexed: 05/22/2023]
Abstract
Eating satiating, protein-rich foods is one of the key aspects of modern diet, although a bitter off-taste often limits the application of some proteins and protein hydrolysates, especially in processed foods. Previous studies of our group demonstrated that bitter-tasting food constituents, such as caffeine, stimulate mechanisms of gastric acid secretion as a signal of gastric satiation and a key process of gastric protein digestion via activation of bitter taste receptors (TAS2Rs). Here, we tried to elucidate whether dietary non-bitter-tasting casein is intra-gastrically degraded into bitter peptides that stimulate mechanisms of gastric acid secretion in physiologically achievable concentrations. An in vitro model of gastric digestion was verified by casein-fed pigs, and the peptides resulting from gastric digestion were identified by liquid chromatography-time-of-flight-mass spectrometry. The bitterness of five selected casein-derived peptides was validated by sensory analyses and by an in vitro screening approach based on human gastric parietal cells (HGT-1). For three of these peptides (YFYPEL, VAPFPEVF, and YQEPVLGPVRGPFPIIV), an upregulation of gene expression of TAS2R16 and TAS2R38 was observed. The functional involvement of these TAS2Rs was verified by siRNA knock-down (kd) experiments in HGT-1 cells. This resulted in a reduction of the mean proton secretion promoted by the peptides by up to 86.3 ± 9.9% for TAS2R16kd (p < 0.0001) cells and by up to 62.8 ± 7.0% for TAS2R38kd (p < 0.0001) cells compared with mock-transfected cells.
Collapse
Affiliation(s)
- Phil Richter
- Leibniz
Institute for Food Systems Biology at the Technical University of
Munich, Lise-M eitner-Straße
34, 85354Freising, Germany
| | - Karin Sebald
- Leibniz
Institute for Food Systems Biology at the Technical University of
Munich, Lise-M eitner-Straße
34, 85354Freising, Germany
| | - Konrad Fischer
- Chair
of Livestock Biotechnology, TUM School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Straße 1, 85354Freising, Germany
| | - Maik Behrens
- Leibniz
Institute for Food Systems Biology at the Technical University of
Munich, Lise-M eitner-Straße
34, 85354Freising, Germany
| | - Angelika Schnieke
- Chair
of Livestock Biotechnology, TUM School of Life Sciences, Technical University of Munich, Liesel-Beckmann-Straße 1, 85354Freising, Germany
| | - Veronika Somoza
- Leibniz
Institute for Food Systems Biology at the Technical University of
Munich, Lise-M eitner-Straße
34, 85354Freising, Germany
- Chair
of Nutritional Systems Biology, TUM School of Life Sciences, Technical University of Munich, Lise-Meitner-Straße 34, 85354Freising, Germany
- Department
of Physiological Chemistry, Faculty of Chemistry, University of Vienna, Josef-Holaubek-Platz 2 (UZA II), 1090Wien, Austria
- . Phone +49-8161-71-2700
| |
Collapse
|
46
|
A Statistical Analysis of the Sequence and Structure of Thermophilic and Non-Thermophilic Proteins. Int J Mol Sci 2022; 23:ijms231710116. [PMID: 36077513 PMCID: PMC9456548 DOI: 10.3390/ijms231710116] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2022] [Revised: 08/29/2022] [Accepted: 08/31/2022] [Indexed: 11/17/2022] Open
Abstract
Thermophilic proteins have various practical applications in theoretical research and in industry. In recent years, the demand for thermophilic proteins on an industrial scale has been increasing; therefore, the engineering of thermophilic proteins has become a hot direction in the field of protein engineering. However, the exact mechanism of thermostability of proteins is not yet known, for engineering thermophilic proteins knowing the basis of thermostability is necessary. In order to understand the basis of the thermostability in proteins, we have made a statistical analysis of the sequences, secondary structures, hydrogen bonds, salt bridges, DHA (Donor-Hydrogen-Accepter) angles, and bond lengths of ten pairs of thermophilic proteins and their non-thermophilic orthologous. Our findings suggest that polar amino acids contribute to thermostability in proteins by forming hydrogen bonds and salt bridges which provide resistance against protein denaturation. Short bond length and a wider DHA angle provide greater bond stability in thermophilic proteins. Moreover, the increased frequency of aromatic amino acids in thermophilic proteins contributes to thermal stability by forming more aromatic interactions. Additionally, the coil, helix, and loop in the secondary structure also contribute to thermostability.
Collapse
|
47
|
Watanabe N, Yamamoto M, Murata M, Vavricka CJ, Ogino C, Kondo A, Araki M. Comprehensive Machine Learning Prediction of Extensive Enzymatic Reactions. J Phys Chem B 2022; 126:6762-6770. [PMID: 36053051 DOI: 10.1021/acs.jpcb.2c03287] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
New enzyme functions exist within the increasing number of unannotated protein sequences. Novel enzyme discovery is necessary to expand the pathways that can be accessed by metabolic engineering for the biosynthesis of functional compounds. Accordingly, various machine learning models have been developed to predict enzymatic reactions. However, the ability to predict unknown reactions that are not included in the training data has not been clarified. In order to cover uncertain and unknown reactions, a wider range of reaction types must be demonstrated by the models. Here, we establish 16 expanded enzymatic reaction prediction models developed using various machine learning algorithms, including deep neural network. Improvements in prediction performances over that of our previous study indicate that the updated methods are more effective for the prediction of enzymatic reactions. Overall, the deep neural network model trained with combined substrate-enzyme-product information exhibits the highest prediction accuracy with Macro F1 scores up to 0.966 and with robust prediction of unknown enzymatic reactions that are not included in the training data. This model can predict more extensive enzymatic reactions in comparison to previously reported models. This study will facilitate the discovery of new enzymes for the production of useful substances.
Collapse
Affiliation(s)
- Naoki Watanabe
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan
| | - Masaki Yamamoto
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan
| | - Masahiro Murata
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan
| | - Christopher J Vavricka
- Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
| | - Chiaki Ogino
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan
| | - Akihiko Kondo
- Department of Chemical Science and Engineering Graduate School of Engineering, Kobe University, 1-1 Rokkodai-cho, Nada, Kobe, Hyogo 657-8501, Japan.,Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan
| | - Michihiro Araki
- Graduate School of Medicine, Kyoto University, 54 Kawahara-cho, Shogoin Sakyo-ku, Kyoto 606-8507, Japan.,Graduate School of Science, Technology and Innovation, Kobe University, 1-1 Rokkodai-cho, Nada-ku, Kobe 657-8501, Japan.,National Institutes of Biomedical Innovation, Health and Nutrition, National Institute of Health and Nutrition, 1-23-1 Toyama, Shinjuku-ku, Tokyo 162-8638, Japan.,National Cerebral and Cardiovascular Center, 6-1 Kishibe-Shinmachi, Suita, Osaka 564-8565, Japan
| |
Collapse
|
48
|
BERT-PPII: The Polyproline Type II Helix Structure Prediction Model Based on BERT and Multichannel CNN. BIOMED RESEARCH INTERNATIONAL 2022; 2022:9015123. [PMID: 36060139 PMCID: PMC9433275 DOI: 10.1155/2022/9015123] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/01/2022] [Accepted: 08/03/2022] [Indexed: 11/26/2022]
Abstract
Predicting the polyproline type II (PPII) helix structure is crucial important in many research areas, such as the protein folding mechanisms, the drug targets, and the protein functions. However, many existing PPII helix prediction algorithms encode the protein sequence information in a single way, which causes the insufficient learning of protein sequence feature information. To improve the protein sequence encoding performance, this paper proposes a BERT-based PPII helix structure prediction algorithm (BERT-PPII), which learns the protein sequence information based on the BERT model. The BERT model's CLS vector can fairly fuse sample's each amino acid residue information. Thus, we utilize the CLS vector as the global feature to represent the sample's global contextual information. As the interactions among the protein chains' local amino acid residues have an important influence on the formation of PPII helix, we utilize the CNN to extract local amino acid residues' features which can further enhance the information expression of protein sequence samples. In this paper, we fuse the CLS vectors with CNN local features to improve the performance of predicting PPII structure. Compared to the state-of-the-art PPIIPRED method, the experimental results on the unbalanced dataset show that the proposed method improves the accuracy value by 1% on the strict dataset and 2% on the less strict dataset. Correspondingly, the results on the balanced dataset show that the AUCs of the proposed method are 0.826 on the strict dataset and 0.785 on less strict datasets, respectively. For the independent test set, the proposed method has the AUC value of 0.827 on the strict dataset and 0.783 on the less strict dataset. The above experimental results have proved that the proposed BERT-PPII method can achieve a superior performance of predicting the PPII helix.
Collapse
|
49
|
Dubovski N, Fierro F, Margulis E, Ben Shoshan-Galeczki Y, Peri L, Niv MY. Taste GPCRs and their ligands. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2022; 193:177-193. [PMID: 36357077 DOI: 10.1016/bs.pmbts.2022.06.008] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Taste GPCRs are expressed in taste buds on the tongue and play a key role in food choice and consumption. They are also expressed extra-orally, with various physiological roles that are currently under study. Unraveling the roles of these receptors relies on the knowledge of their ligands. Combining sensory, cell-based and computational approaches enabled the discovery of numerous agonists and several antagonists. Here we provide a short overview of taste receptor families, main recent methods for ligands discovery, and current sources of information about known ligands. The future directions that are likely to impact the taste GPCR field include focus on ligand interactions with naturally occurring polymorphisms, as well as harnessing the power of CryoEM and of multiple signaling readout techniques.
Collapse
Affiliation(s)
- Nitzan Dubovski
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Fabrizio Fierro
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Eitan Margulis
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Yaron Ben Shoshan-Galeczki
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Lior Peri
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel
| | - Masha Y Niv
- The Institute of Biochemistry, Food Science and Nutrition, Robert H. Smith Faculty of Agriculture, Food and Environment, The Hebrew University of Jerusalem, Rehovot, Israel.
| |
Collapse
|
50
|
FRTpred: A novel approach for accurate prediction of protein folding rate and type. Comput Biol Med 2022; 149:105911. [DOI: 10.1016/j.compbiomed.2022.105911] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2022] [Revised: 07/08/2022] [Accepted: 07/23/2022] [Indexed: 11/20/2022]
|