1
|
Liu S, Shi T, Yu J, Li R, Lin H, Deng K. Research on Bitter Peptides in the Field of Bioinformatics: A Comprehensive Review. Int J Mol Sci 2024; 25:9844. [PMID: 39337334 PMCID: PMC11432553 DOI: 10.3390/ijms25189844] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2024] [Revised: 09/06/2024] [Accepted: 09/09/2024] [Indexed: 09/30/2024] Open
Abstract
Bitter peptides are small molecular peptides produced by the hydrolysis of proteins under acidic, alkaline, or enzymatic conditions. These peptides can enhance food flavor and offer various health benefits, with attributes such as antihypertensive, antidiabetic, antioxidant, antibacterial, and immune-regulating properties. They show significant potential in the development of functional foods and the prevention and treatment of diseases. This review introduces the diverse sources of bitter peptides and discusses the mechanisms of bitterness generation and their physiological functions in the taste system. Additionally, it emphasizes the application of bioinformatics in bitter peptide research, including the establishment and improvement of bitter peptide databases, the use of quantitative structure-activity relationship (QSAR) models to predict bitterness thresholds, and the latest advancements in classification prediction models built using machine learning and deep learning algorithms for bitter peptide identification. Future research directions include enhancing databases, diversifying models, and applying generative models to advance bitter peptide research towards deepening and discovering more practical applications.
Collapse
Affiliation(s)
| | | | | | | | - Hao Lin
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| | - Kejun Deng
- School of Life Science and Technology, Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; (S.L.); (T.S.); (J.Y.); (R.L.)
| |
Collapse
|
2
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
3
|
Dhakal A, McKay C, Tanner JJ, Cheng J. Artificial intelligence in the prediction of protein-ligand interactions: recent advances and future directions. Brief Bioinform 2022; 23:bbab476. [PMID: 34849575 PMCID: PMC8690157 DOI: 10.1093/bib/bbab476] [Citation(s) in RCA: 81] [Impact Index Per Article: 40.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2021] [Revised: 09/28/2021] [Accepted: 10/15/2021] [Indexed: 12/13/2022] Open
Abstract
New drug production, from target identification to marketing approval, takes over 12 years and can cost around $2.6 billion. Furthermore, the COVID-19 pandemic has unveiled the urgent need for more powerful computational methods for drug discovery. Here, we review the computational approaches to predicting protein-ligand interactions in the context of drug discovery, focusing on methods using artificial intelligence (AI). We begin with a brief introduction to proteins (targets), ligands (e.g. drugs) and their interactions for nonexperts. Next, we review databases that are commonly used in the domain of protein-ligand interactions. Finally, we survey and analyze the machine learning (ML) approaches implemented to predict protein-ligand binding sites, ligand-binding affinity and binding pose (conformation) including both classical ML algorithms and recent deep learning methods. After exploring the correlation between these three aspects of protein-ligand interaction, it has been proposed that they should be studied in unison. We anticipate that our review will aid exploration and development of more accurate ML-based prediction strategies for studying protein-ligand interactions.
Collapse
Affiliation(s)
- Ashwin Dhakal
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| | - Cole McKay
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
| | - John J Tanner
- Department of Biochemistry, University of Missouri, Columbia, MO, 65211, USA
- Department of Chemistry, University of Missouri, Columbia, MO, 65211, USA
| | - Jianlin Cheng
- Department of Electrical Engineering and Computer Science, University of Missouri, Columbia, MO, 65211, USA
| |
Collapse
|
4
|
Santana CA, Silveira SDA, Moraes JPA, Izidoro SC, de Melo-Minardi RC, Ribeiro AJM, Tyzack JD, Borkakoti N, Thornton JM. GRaSP: a graph-based residue neighborhood strategy to predict binding sites. Bioinformatics 2020; 36:i726-i734. [DOI: 10.1093/bioinformatics/btaa805] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 09/08/2020] [Indexed: 01/22/2023] Open
Abstract
Abstract
Motivation
The discovery of protein–ligand-binding sites is a major step for elucidating protein function and for investigating new functional roles. Detecting protein–ligand-binding sites experimentally is time-consuming and expensive. Thus, a variety of in silico methods to detect and predict binding sites was proposed as they can be scalable, fast and present low cost.
Results
We proposed Graph-based Residue neighborhood Strategy to Predict binding sites (GRaSP), a novel residue centric and scalable method to predict ligand-binding site residues. It is based on a supervised learning strategy that models the residue environment as a graph at the atomic level. Results show that GRaSP made compatible or superior predictions when compared with methods described in the literature. GRaSP outperformed six other residue-centric methods, including the one considered as state-of-the-art. Also, our method achieved better results than the method from CAMEO independent assessment. GRaSP ranked second when compared with five state-of-the-art pocket-centric methods, which we consider a significant result, as it was not devised to predict pockets. Finally, our method proved scalable as it took 10–20 s on average to predict the binding site for a protein complex whereas the state-of-the-art residue-centric method takes 2–5 h on average.
Availability and implementation
The source code and datasets are available at https://github.com/charles-abreu/GRaSP.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles A Santana
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - Sabrina de A Silveira
- Department of Computer Science, Universidade Federal de Viçosa, Viçosa 36570-900, Brazil
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - João P A Moraes
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Sandro C Izidoro
- Institute of Technological Sciences (ICT), Advanced Campus at Itabira, Universidade Federal de Itajubá, Itabira 35903-087, Brazil
| | - Raquel C de Melo-Minardi
- Department of Biochemistry and Immunology
- Department of Computer Science, Universidade Federal de Minas Gerais, Belo Horizonte 31270-901, Brazil
| | - António J M Ribeiro
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Jonathan D Tyzack
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Neera Borkakoti
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Janet M Thornton
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
5
|
Predicting binding sites from unbound versus bound protein structures. Sci Rep 2020; 10:15856. [PMID: 32985584 PMCID: PMC7522209 DOI: 10.1038/s41598-020-72906-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 07/27/2020] [Indexed: 11/30/2022] Open
Abstract
We present the application of seven binding-site prediction algorithms to a meticulously curated dataset of ligand-bound and ligand-free crystal structures for 304 unique protein sequences (2528 crystal structures). We probe the influence of starting protein structures on the results of binding-site prediction, so the dataset contains a minimum of two ligand-bound and two ligand-free structures for each protein. We use this dataset in a brief survey of five geometry-based, one energy-based, and one machine-learning-based methods: Surfnet, Ghecom, LIGSITEcsc, Fpocket, Depth, AutoSite, and Kalasanty. Distributions of the F scores and Matthew’s correlation coefficients for ligand-bound versus ligand-free structure performance show no statistically significant difference in structure type versus performance for most methods. Only Fpocket showed a statistically significant but low magnitude enhancement in performance for holo structures. Lastly, we found that most methods will succeed on some crystal structures and fail on others within the same protein family, despite all structures being relatively high-quality structures with low structural variation. We expected better consistency across varying protein conformations of the same sequence. Interestingly, the success or failure of a given structure cannot be predicted by quality metrics such as resolution, Cruickshank Diffraction Precision index, or unresolved residues. Cryptic sites were also examined.
Collapse
|
6
|
Liu L, Hu X, Feng Z, Wang S, Sun K, Xu S. Recognizing Ion Ligand-Binding Residues by Random Forest Algorithm Based on Optimized Dihedral Angle. Front Bioeng Biotechnol 2020; 8:493. [PMID: 32596216 PMCID: PMC7303464 DOI: 10.3389/fbioe.2020.00493] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2019] [Accepted: 04/28/2020] [Indexed: 11/26/2022] Open
Abstract
The prediction of ion ligand–binding residues in protein sequences is a challenging work that contributes to understand the specific functions of proteins in life processes. In this article, we selected binding residues of 14 ion ligands as research objects, including four acid radical ion ligands and 10 metal ion ligands. Based on the amino acid sequence information, we selected the composition and position conservation information of amino acids, the predicted structural information, and physicochemical properties of amino acids as basic feature parameters. We then performed a statistical analysis and reclassification for dihedral angle and proposed new methods on the extraction of feature parameters. The methods mainly included applying information entropy on the extraction of polarization charge and hydrophilic–hydrophobic information of amino acids and using position weight matrices on the extraction of position conservation information. In the prediction model, we used the random forest algorithm and obtained better prediction results than previous works. With the independent test, the Matthew's correlation coefficient and accuracy of 10 metal ion ligand–binding residues were larger than 0.07 and 52%, respectively; the corresponding evaluation values of four acid radical ion ligand–binding residues were larger than 0.15 and 86%, respectively. Further, we classified and combined the phi and psi angles and optimized prediction model for each ion ligand–binding residue.
Collapse
Affiliation(s)
- Liu Liu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Shan Wang
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Kai Sun
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| | - Shuang Xu
- College of Sciences, Inner Mongolia University of Technology, Hohhot, China
| |
Collapse
|
7
|
Gattani S, Mishra A, Hoque MT. StackCBPred: A stacking based prediction of protein-carbohydrate binding sites from sequence. Carbohydr Res 2019; 486:107857. [DOI: 10.1016/j.carres.2019.107857] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2019] [Revised: 10/05/2019] [Accepted: 10/23/2019] [Indexed: 11/26/2022]
|
8
|
Ding Y, Tang J, Guo F. Identification of Protein-Ligand Binding Sites by Sequence Information and Ensemble Classifier. J Chem Inf Model 2017; 57:3149-3161. [PMID: 29125297 DOI: 10.1021/acs.jcim.7b00307] [Citation(s) in RCA: 53] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Identifying protein-ligand binding sites is an important process in drug discovery and structure-based drug design. Detecting protein-ligand binding sites is expensive and time-consuming by traditional experimental methods. Hence, computational approaches provide many effective strategies to deal with this issue. Recently, lots of computational methods are based on structure information on proteins. However, these methods are limited in the common scenario, where both the sequence of protein target is known and sufficient 3D structure information is available. Studies indicate that sequence-based computational approaches for predicting protein-ligand binding sites are more practical. In this paper, we employ a novel computational model of protein-ligand binding sites prediction, using protein sequence. We apply the Discrete Cosine Transform (DCT) to extract feature from Position-Specific Score Matrix (PSSM). In order to improve the accuracy, Predicted Relative Solvent Accessibility (PRSA) information is also utilized. The predictor of protein-ligand binding sites is built by employing the ensemble weighted sparse representation model with random under-sampling. To evaluate our method, we conduct several comprehensive tests (12 types of ligands testing sets) for predicting protein-ligand binding sites. Results show that our method achieves better Matthew's correlation coefficient (MCC) than other outstanding methods on independent test sets of ATP (0.506), ADP (0.511), AMP (0.393), GDP (0.579), GTP (0.641), Mg2+ (0.317), Fe3+ (0.490) and HEME (0.640). Our proposed method outperforms earlier predictors (the performance of MCC) in 8 of the 12 ligands types.
Collapse
Affiliation(s)
- Yijie Ding
- School of Computer Science and Technology, Tianjin University , No. 135, Yaguan Road, Tianjin Haihe Education Park, Tianjin 300350, China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University , No. 135, Yaguan Road, Tianjin Haihe Education Park, Tianjin 300350, China.,Department of Computer Science and Engineering, University of South Carolina , Columbia, South Carolina 29208, United States
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University , No. 135, Yaguan Road, Tianjin Haihe Education Park, Tianjin 300350, China
| |
Collapse
|
9
|
Multi-Class Disease Classification in Brain MRIs Using a Computer-Aided Diagnostic System. Symmetry (Basel) 2017. [DOI: 10.3390/sym9030037] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
|
10
|
Broomhead NK, Soliman ME. Can We Rely on Computational Predictions To Correctly Identify Ligand Binding Sites on Novel Protein Drug Targets? Assessment of Binding Site Prediction Methods and a Protocol for Validation of Predicted Binding Sites. Cell Biochem Biophys 2016; 75:15-23. [PMID: 27796788 DOI: 10.1007/s12013-016-0769-y] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Accepted: 10/19/2016] [Indexed: 11/30/2022]
Abstract
In the field of medicinal chemistry there is increasing focus on identifying key proteins whose biochemical functions can firmly be linked to serious diseases. Such proteins become targets for drug or inhibitor molecules that could treat or halt the disease through therapeutic action or by blocking the protein function respectively. The protein must be targeted at the relevant biologically active site for drug or inhibitor binding to be effective. As insufficient experimental data is available to confirm the biologically active binding site for novel protein targets, researchers often rely on computational prediction methods to identify binding sites. Presented herein is a short review on structure-based computational methods that (i) predict putative binding sites and (ii) assess the druggability of predicted binding sites on protein targets. This review briefly covers the principles upon which these methods are based, where they can be accessed and their reliability in identifying the correct binding site on a protein target. Based on this review, we believe that these methods are useful in predicting putative binding sites, but as they do not account for the dynamic nature of protein-ligand binding interactions, they cannot definitively identify the correct site from a ranked list of putative sites. To overcome this shortcoming, we strongly recommend using molecular docking to predict the most likely protein-ligand binding site(s) and mode(s), followed by molecular dynamics simulations and binding thermodynamics calculations to validate the docking results. This protocol provides a valuable platform for experimental and computational efforts to design novel drugs and inhibitors that target disease-related proteins.
Collapse
Affiliation(s)
- Neal K Broomhead
- Molecular Modelling & Drug Design Research Group, School of Health Sciences, University of KwaZulu-Natal, Westville, Durban, 4001, South Africa
| | - Mahmoud E Soliman
- Molecular Modelling & Drug Design Research Group, School of Health Sciences, University of KwaZulu-Natal, Westville, Durban, 4001, South Africa.
| |
Collapse
|
11
|
Taherzadeh G, Zhou Y, Liew AWC, Yang Y. Sequence-Based Prediction of Protein-Carbohydrate Binding Sites Using Support Vector Machines. J Chem Inf Model 2016; 56:2115-2122. [PMID: 27623166 DOI: 10.1021/acs.jcim.6b00320] [Citation(s) in RCA: 44] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
Carbohydrate-binding proteins play significant roles in many diseases including cancer. Here, we established a machine-learning-based method (called sequence-based prediction of residue-level interaction sites of carbohydrates, SPRINT-CBH) to predict carbohydrate-binding sites in proteins using support vector machines (SVMs). We found that integrating evolution-derived sequence profiles with additional information on sequence and predicted solvent accessible surface area leads to a reasonably accurate, robust, and predictive method, with area under receiver operating characteristic curve (AUC) of 0.78 and 0.77 and Matthew's correlation coefficient of 0.34 and 0.29, respectively for 10-fold cross validation and independent test without balancing binding and nonbinding residues. The quality of the method is further demonstrated by having statistically significantly more binding residues predicted for carbohydrate-binding proteins than presumptive nonbinding proteins in the human proteome, and by the bias of rare alleles toward predicted carbohydrate-binding sites for nonsynonymous mutations from the 1000 genome project. SPRINT-CBH is available as an online server at http://sparks-lab.org/server/SPRINT-CBH .
Collapse
Affiliation(s)
- Ghazaleh Taherzadeh
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yaoqi Zhou
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Alan Wee-Chung Liew
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| | - Yuedong Yang
- School of Information and Communication Technology and ‡Institute for Glycomics, Griffith University , Parklands Drive, Southport, Queensland 4215, Australia
| |
Collapse
|