1
|
Zhang J, Zhou F, Liang X, Kurgan L. Accurate Prediction of Protein-Binding Residues in Protein Sequences Using SCRIBER. Methods Mol Biol 2025; 2867:247-260. [PMID: 39576586 DOI: 10.1007/978-1-0716-4196-5_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2024]
Abstract
Deciphering molecular-level mechanisms that govern protein-protein interactions (PPIs) relies in part on the accurate prediction of protein-binding partners and protein-binding residues. These predictions can be used to support a wide spectrum of applications that include development of PPI networks and protein docking programs, drug design studies, and investigations of molecular details that underlie certain diseases. Computational methods that predict protein-binding residues offer convenient, inexpensive, and relatively accurate data that can aid these efforts. We introduce and describe a user-friendly webserver for the SCRIBER method that conveniently provides state-of-the-art predictions of protein-binding residues and that minimizes cross-predictions, i.e., incorrect prediction of residues that bind other/non-protein ligands as protein binding. SCRIBER relies on a two-layer architecture that is specifically designed to reduce the cross-predictions. We motivate and explain this predictive architecture. We describe how to use the webserver, interact with its web interface, and collect, read, and understand results generated by SCRIBER. The SCRIBER webserver is available at http://biomine.cs.vcu.edu/servers/SCRIBER/ .
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China.
| | - Feng Zhou
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Xingchen Liang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang, China
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
| |
Collapse
|
2
|
Cecil AJ, Sogues A, Gurumurthi M, Lane KS, Remaut H, Pak AJ. Molecular dynamics and machine learning stratify motion-dependent activity profiles of S-layer destabilizing nanobodies. PNAS NEXUS 2024; 3:pgae538. [PMID: 39660065 PMCID: PMC11631148 DOI: 10.1093/pnasnexus/pgae538] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Accepted: 11/04/2024] [Indexed: 12/12/2024]
Abstract
Nanobody (Nb)-induced disassembly of surface array protein (Sap) S-layers, a two-dimensional paracrystalline protein lattice from Bacillus anthracis, has been presented as a therapeutic intervention for lethal anthrax infections. However, only a subset of existing Nbs with affinity to Sap exhibit depolymerization activity, suggesting that affinity and epitope recognition are not enough to explain inhibitory activity. In this study, we performed all-atom molecular dynamics simulations of each Nb bound to the Sap binding site and trained a collection of machine learning classifiers to predict whether each Nb induces depolymerization. We used feature importance analysis to filter out unnecessary features and engineered remaining features to regularize the feature landscape and encourage learning of the depolymerization mechanism. We find that, while not enforced in training, a gradient-boosting decision tree is able to reproduce the experimental activities of inhibitory Nbs while maintaining high classification accuracy, whereas neural networks were only able to discriminate between classes. Further feature analysis revealed that inhibitory Nbs restrain Sap motions toward an inhibitory conformational state described by domain-domain clamping and induced twisting of domains normal to the lattice plane. We believe these motions drive Sap lattice depolymerization and can be used as design targets for improved Sap-inhibitory Nbs. Finally, we expect our method of study to apply to S-layers that serve as virulence factors in other pathogens, paving the way forward for Nb therapeutics that target depolymerization mechanisms.
Collapse
Affiliation(s)
- Adam J Cecil
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, CO 80401, USA
| | - Adrià Sogues
- Structural and Molecular Microbiology, VIB-VUB Center for Structural Biology, Pleinlaan 2, 1050 Brussels, Belgium
- Structural Biology, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Mukund Gurumurthi
- Quantitative Biosciences and Engineering Program, Colorado School of Mines, Golden, CO 80401, USA
| | - Kaylee S Lane
- Computer Science and Software Engineering, Rose-Hulman Institute of Technology, Terre Haute, IN 47803, USA
| | - Han Remaut
- Structural and Molecular Microbiology, VIB-VUB Center for Structural Biology, Pleinlaan 2, 1050 Brussels, Belgium
- Structural Biology, Vrije Universiteit Brussel, Pleinlaan 2, 1050 Brussels, Belgium
| | - Alexander J Pak
- Department of Chemical and Biological Engineering, Colorado School of Mines, Golden, CO 80401, USA
- Quantitative Biosciences and Engineering Program, Colorado School of Mines, Golden, CO 80401, USA
- Materials Science Program, Colorado School of Mines, Golden, CO 80401, USA
| |
Collapse
|
3
|
Li W, Chen N, Wang J, Luo Y, Liu H, Ding J, Jin Q. Species-specific model based on sequence and structural information for ubiquitination sites prediction. J Mol Biol 2024; 436:168781. [PMID: 39245319 DOI: 10.1016/j.jmb.2024.168781] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2024] [Revised: 08/19/2024] [Accepted: 09/03/2024] [Indexed: 09/10/2024]
Abstract
Ubiquitination is a common post-translational modification of proteins in eukaryotic cells, and it is also a significant method of regulating protein biological function. Computational methods for predicting ubiquitination sites can serve as a cost-effective and time-saving alternative to experimental methods. Existing computational methods often build classifiers based on protein sequence information, physical and chemical properties of amino acids, evolutionary information, and structural parameters. However, structural information about most proteins cannot be found in existing databases directly. The features of proteins differ among species, and some species have small amounts of ubiquitinated proteins. Therefore, it is necessary to develop species-specific models that can be applied to datasets with small sample sizes. To solve these problems, we propose a species-specific model (SSUbi) based on a capsule network, which integrates proteins' sequence and structural information. In this model, the feature extraction module is composed of two sub-modules that extract multi-dimensional features from sequence and structural information respectively. In the submodule, the convolution operation is used to extract encoding dimension features, and the channel attention mechanism is used to extract feature map dimension features. After integrating the multi-dimensional features from both types of information, the species-specific capsule network further converts the features into capsule vectors and classifies species-specific ubiquitination sites. The experimental results show that SSUbi can effectively improve the prediction performance of species with small sample sizes and outperform other models.
Collapse
Affiliation(s)
- Weimin Li
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Nan Chen
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Jie Wang
- School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
| | - Yin Luo
- School of Life Sciences, East China Normal University, Haikou 200062, China
| | - Huazhong Liu
- School of Computer Science and Technology, Hainan University, Haikou 570208, China
| | - Jihong Ding
- School of Computer Science and Technology, Hainan University, Haikou 570208, China
| | - Qun Jin
- Faculty of Human Sciences, Waseda University, Tokorozawa 359-1192, Japan
| |
Collapse
|
4
|
Jia P, Zhang F, Wu C, Li M. A comprehensive review of protein-centric predictors for biomolecular interactions: from proteins to nucleic acids and beyond. Brief Bioinform 2024; 25:bbae162. [PMID: 38739759 PMCID: PMC11089422 DOI: 10.1093/bib/bbae162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/01/2024] [Revised: 02/17/2024] [Accepted: 03/31/2024] [Indexed: 05/16/2024] Open
Abstract
Proteins interact with diverse ligands to perform a large number of biological functions, such as gene expression and signal transduction. Accurate identification of these protein-ligand interactions is crucial to the understanding of molecular mechanisms and the development of new drugs. However, traditional biological experiments are time-consuming and expensive. With the development of high-throughput technologies, an increasing amount of protein data is available. In the past decades, many computational methods have been developed to predict protein-ligand interactions. Here, we review a comprehensive set of over 160 protein-ligand interaction predictors, which cover protein-protein, protein-nucleic acid, protein-peptide and protein-other ligands (nucleotide, heme, ion) interactions. We have carried out a comprehensive analysis of the above four types of predictors from several significant perspectives, including their inputs, feature profiles, models, availability, etc. The current methods primarily rely on protein sequences, especially utilizing evolutionary information. The significant improvement in predictions is attributed to deep learning methods. Additionally, sequence-based pretrained models and structure-based approaches are emerging as new trends.
Collapse
Affiliation(s)
- Pengzhen Jia
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Fuhao Zhang
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
- College of Information Engineering, Northwest A&F University, No. 3 Taicheng Road, Yangling, Shaanxi 712100, China
| | - Chaojin Wu
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, 932 Lushan Road(S), Changsha 410083, China
| |
Collapse
|
5
|
Hosseini S, Golding GB, Ilie L. Seq-InSite: sequence supersedes structure for protein interaction site prediction. Bioinformatics 2024; 40:btad738. [PMID: 38212995 PMCID: PMC10796176 DOI: 10.1093/bioinformatics/btad738] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2023] [Revised: 11/17/2023] [Accepted: 01/10/2024] [Indexed: 01/13/2024] Open
Abstract
MOTIVATION Proteins accomplish cellular functions by interacting with each other, which makes the prediction of interaction sites a fundamental problem. As experimental methods are expensive and time consuming, computational prediction of the interaction sites has been studied extensively. Structure-based programs are the most accurate, while the sequence-based ones are much more widely applicable, as the sequences available outnumber the structures by two orders of magnitude. Ideally, we would like a tool that has the quality of the former and the applicability of the latter. RESULTS We provide here the first solution that achieves these two goals. Our new sequence-based program, Seq-InSite, greatly surpasses the performance of sequence-based models, matching the quality of state-of-the-art structure-based predictors, thus effectively superseding the need for models requiring structure. The predictive power of Seq-InSite is illustrated using an analysis of evolutionary conservation for four protein sequences. AVAILABILITY AND IMPLEMENTATION Seq-InSite is freely available as a web server at http://seq-insite.csd.uwo.ca/ and as free source code, including trained models and all datasets used for training and testing, at https://github.com/lucian-ilie/Seq-InSite.
Collapse
Affiliation(s)
- SeyedMohsen Hosseini
- Department of Computer Science, University of Western Ontario, London, ON N6A 5B7, Canada
| | - G Brian Golding
- Department of Biology, McMaster University, Hamilton, ON L8S 4K1, Canada
| | - Lucian Ilie
- Department of Computer Science, University of Western Ontario, London, ON N6A 5B7, Canada
| |
Collapse
|
6
|
Zeng X, Meng FF, Li X, Zhong KY, Jiang B, Li Y. GHGPR-PPIS: A graph convolutional network for identifying protein-protein interaction site using heat kernel with Generalized PageRank techniques and edge self-attention feature processing block. Comput Biol Med 2024; 168:107683. [PMID: 37984202 DOI: 10.1016/j.compbiomed.2023.107683] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2023] [Revised: 10/10/2023] [Accepted: 11/06/2023] [Indexed: 11/22/2023]
Abstract
Accurately pinpointing protein-protein interaction site (PPIS) on the molecular level is of utmost significance for annotating protein function and comprehending the mechanisms underpinning various diseases. While numerous computational methods for predicting PPIS have emerged, they have indeed mitigated the labor and time constraints associated with traditional experimental methods. However, the predictive accuracy of these methods has yet to reach the desired threshold. In this context, we proposed a groundbreaking graph-based computational model called GHGPR-PPIS. This innovative model leveraged a graph convolutional network using heat kernel (GraphHeat) in conjunction with Generalized PageRank techniques (GHGPR) to predict PPIS. Additionally, building upon the GHGPR framework, we devised an edge self-attention feature processing block, further augmenting the performance of the model. Experimental findings conclusively demonstrated that GHGPR-PPIS surpassed all competing state-of-the-art models when evaluated on the benchmark test set. Impressively, on two distinct independent test sets and a specific protein chain, GHGPR-PPIS consistently demonstrated superior generalization performance and practical applicability compared to the comparative model, AGAT-PPIS. Lastly, leveraging the t-SNE dimensionality reduction algorithm and clustering visualization technique, we delved into an interpretability analysis of the effectiveness of GHGPR-PPIS by meticulously comparing the outputs from different stages of the model.
Collapse
Affiliation(s)
- Xin Zeng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Fan-Fang Meng
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Xin Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Kai-Yang Zhong
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China
| | - Bei Jiang
- Yunnan Key Laboratory of Screening and Research on Anti-pathogenic Plant Resources from Western Yunnan, Dali University, Dali, 671000, China
| | - Yi Li
- College of Mathematics and Computer Science, Dali University, Dali, 671003, China.
| |
Collapse
|
7
|
Tanveerul Hassan M, Tayara H, To Chong K. Meta-IL4: An Ensemble Learning Approach for IL-4-Inducing Peptide Prediction. Methods 2023:S1046-2023(23)00113-5. [PMID: 37454743 DOI: 10.1016/j.ymeth.2023.07.002] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2022] [Revised: 03/25/2023] [Accepted: 07/10/2023] [Indexed: 07/18/2023] Open
Abstract
The cytokine interleukin-4 (IL-4) plays an important role in our immune system. IL-4 leads the way in the differentiation of naïve T-helper 0 cells (Th0) to T-helper 2 cells (Th2). The Th2 responses are characterized by the release of IL-4. CD4+ T cells produce the cytokine IL-4 in response to exogenous parasites. IL-4 has a critical role in the growth of CD8+ cells, inflammation, and responses of T-cells. We propose an ensemble model for the prediction of IL-4 inducing peptides. Four feature encodings were extracted to build an efficient predictor: pseudo-amino acid composition, amphiphilic pseudo-amino acid composition, quasi-sequence-order, and Shannon entropy. We developed an ensemble learning model fusion of random forest, extreme gradient boost, light gradient boosting machine, and extra tree classifier in the first layer, and a Gaussian process classifier as a meta classifier in the second layer. The outcome of the benchmarking testing dataset, with a Matthews correlation coefficient of 0.793, showed that the meta-model (Meta-IL4) outperformed individual classifiers. The highest accuracy achieved by the Meta-IL4 model is 90.70%. These findings suggest that peptides that induce IL-4 can be predicted with reasonable accuracy. These models could aid in the development of peptides that trigger the appropriate Th2 response.
Collapse
Affiliation(s)
- Mir Tanveerul Hassan
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, South Korea.
| | - Kil To Chong
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, South Korea; Advances Electronics and Information Research Centre, Jeonbuk National University, Jeonju, South Korea.
| |
Collapse
|
8
|
ACP-ADA: A Boosting Method with Data Augmentation for Improved Prediction of Anticancer Peptides. Int J Mol Sci 2022; 23:ijms232012194. [PMID: 36293050 PMCID: PMC9603247 DOI: 10.3390/ijms232012194] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2022] [Revised: 10/08/2022] [Accepted: 10/11/2022] [Indexed: 11/30/2022] Open
Abstract
Cancer is the second-leading cause of death worldwide, and therapeutic peptides that target and destroy cancer cells have received a great deal of interest in recent years. Traditional wet experiments are expensive and inefficient for identifying novel anticancer peptides; therefore, the development of an effective computational approach is essential to recognize ACP candidates before experimental methods are used. In this study, we proposed an Ada-boosting algorithm with the base learner random forest called ACP-ADA, which integrates binary profile feature, amino acid index, and amino acid composition with a 210-dimensional feature space vector to represent the peptides. Training samples in the feature space were augmented to increase the sample size and further improve the performance of the model in the case of insufficient samples. Furthermore, we used five-fold cross-validation to find model parameters, and the cross-validation results showed that ACP-ADA outperforms existing methods for this feature combination with data augmentation in terms of performance metrics. Specifically, ACP-ADA recorded an average accuracy of 86.4% and a Mathew’s correlation coefficient of 74.01% for dataset ACP740 and 90.83% and 81.65% for dataset ACP240; consequently, it can be a very useful tool in drug development and biomedical research.
Collapse
|