1
|
Hu J, Chen KX, Rao B, Ni JY, Thafar MA, Albaradei S, Arif M. Protein-peptide binding residue prediction based on protein language models and cross-attention mechanism. Anal Biochem 2024; 694:115637. [PMID: 39121938 DOI: 10.1016/j.ab.2024.115637] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2024] [Revised: 07/28/2024] [Accepted: 08/06/2024] [Indexed: 08/12/2024]
Abstract
Accurate identifications of protein-peptide binding residues are essential for protein-peptide interactions and advancing drug discovery. To address this problem, extensive research efforts have been made to design more discriminative feature representations. However, extracting these explicit features usually depend on third-party tools, resulting in low computational efficacy and suffering from low predictive performance. In this study, we design an end-to-end deep learning-based method, E2EPep, for protein-peptide binding residue prediction using protein sequence only. E2EPep first employs and fine-tunes two state-of-the-art pre-trained protein language models that can extract two different high-latent feature representations from protein sequences relevant for protein structures and functions. A novel feature fusion module is then designed in E2EPep to fuse and optimize the above two feature representations of binding residues. In addition, we have also design E2EPep+, which integrates E2EPep and PepBCL models, to improve the prediction performance. Experimental results on two independent testing data sets demonstrate that E2EPep and E2EPep + could achieve the average AUC values of 0.846 and 0.842 while achieving an average Matthew's correlation coefficient value that is significantly higher than that of existing most of sequence-based methods and comparable to that of the state-of-the-art structure-based predictors. Detailed data analysis shows that the primary strength of E2EPep lies in the effectiveness of feature representation using cross-attention mechanism to fuse the embeddings generated by two fine-tuned protein language models. The standalone package of E2EPep and E2EPep + can be obtained at https://github.com/ckx259/E2EPep.git for academic use only.
Collapse
Affiliation(s)
- Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China; Center for AI and Computational Biology, Suzhou Institution of Systems Medicine, Suzhou, 215123, China.
| | - Kai-Xin Chen
- College of Information Engineering, Zhejiang University of Technology, Hangzhou, 310023, China
| | - Bing Rao
- School of Information & Electrical Engineering, Hangzhou City University, Hangzhou, 310015, China
| | - Jing-Yuan Ni
- NUIST Reading Academy, Nanjing University of Information Science & Technology, Nanjing, 210044, China
| | - Maha A Thafar
- Department of Computer Science, College of Computers and Information Technology, Taif University, Taif, 21944, Saudi Arabia
| | - Somayah Albaradei
- Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Muhammad Arif
- College of Science and Engineering, Hamad Bin Khalifa University, Doha, 34110, Qatar.
| |
Collapse
|
2
|
Rangra S, Aggarwal KK. Characterization and kinetics of a cathepsin B-inhibiting protein from Musa acuminata Colla peel. Biochimie 2024:S0300-9084(24)00242-6. [PMID: 39461656 DOI: 10.1016/j.biochi.2024.10.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 10/23/2024] [Accepted: 10/24/2024] [Indexed: 10/29/2024]
Abstract
Hyperexpression of cathepsin B caused by an imbalance of endogenous inhibitors is involved in multiple pathologies, hence making it a key therapeutic target. Protease inhibitors are effective biomolecules that regulate protease activities and are considered potential therapeutic agents in various diseases. Plant protease inhibitors have been reported as an effective complementary alternative drug. A proteinaceous cathepsin B inhibitor (CBI-BP) has been isolated from Musa acuminata Colla (banana) peel with a molecular weight of 27.9 kDa on SDS-PAGE. The purity of the CBI-BP was confirmed on the native- PAGE. The isolated CBI-BP showed an IC50 value of 8.14 μg and a Ki value of 10.59 μg (0.19 μM). Cathepsin B inhibition kinetics indicated that CBI-BP follows a mixed-type of cathepsin B inhibition. Its inhibition activity was also confirmed by reverse zymography. The inhibitor was stable from pH 2.6-10.0 with maximum activity at pH 7.2, temperature 25-100 °C and exhibited thermostability for 60 min at 70 °C. MALDI/TOF/MS analysis of CBI-BP showed 40 % similarity to the GH18 domain-containing protein (A0A4S8JRM9) from Musa balbisiana. Although in-silico docking studies showed binding of A0A4S8JRM9 to cathepsin B affects the binding energy of the substrate to cathepsin B but is not reported for any anti-cathepsin B activity. This suggests that isolated CBI-BP might be a novel protein with anti-cathepsin B activity. Thus the isolated CBI-BP may be further explored as possible anti-cathepsin B drug.
Collapse
Affiliation(s)
- Sabita Rangra
- University School of Biotechnology, Guru Gobind Singh Indraprastha University. New Delhi-110078, India
| | - Kamal Krishan Aggarwal
- University School of Biotechnology, Guru Gobind Singh Indraprastha University. New Delhi-110078, India.
| |
Collapse
|
3
|
Long Y, Donald BR. Predicting Affinity Through Homology (PATH): Interpretable Binding Affinity Prediction with Persistent Homology. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.16.567384. [PMID: 38014181 PMCID: PMC10680814 DOI: 10.1101/2023.11.16.567384] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2023]
Abstract
Accurate binding affinity prediction is crucial to structure-based drug design. Recent work used computational topology to obtain an effective representation of protein-ligand interactions. While algorithms using algebraic topology have proven useful in predicting properties of biomolecules, previous algorithms employed uninterpretable machine learning models which failed to explain the underlying geometric and topological features that drive accurate binding affinity prediction. Moreover, they had high computational complexity which made them intractable for large proteins. We present the fastest known algorithm to compute persistent homology features for protein-ligand complexes using opposition distance, with a runtime that is independent of the protein size. Then, we exploit these features in a novel, interpretable algorithm to predict protein-ligand binding affinity. Our algorithm achieves interpretability through an effective embedding of distances across bipartite matchings of the protein and ligand atoms into real-valued functions by summing Gaussians centered at features constructed by persistent homology. We name these functions internuclear persistent contours (IPCs) . Next, we introduce persistence fingerprints , a vector with 10 components that sketches the distances of different bipartite matching between protein and ligand atoms, refined from IPCs. Let the number of protein atoms in the protein-ligand complex be n , number of ligand atoms be m , and ω ≈ 2.4 be the matrix multiplication exponent. We show that for any 0 < ε < 1, after an 𝒪 ( mn log( mn )) preprocessing procedure, we can compute an ε -accurate approximation to the persistence fingerprint in 𝒪 ( m log 6 ω ( m/ε )) time, independent of protein size. This is an improvement in time complexity by a factor of 𝒪 (( m + n ) 3 ) over any previous binding affinity prediction that uses persistent homology. We show that the representational power of persistence fingerprint generalizes to protein-ligand binding datasets beyond the training dataset. Then, we introduce PATH , Predicting Affinity Through Homology, a two-part algorithm consisting of PATH + and PATH - . PATH + is an interpretable, small ensemble of shallow regression trees for binding affinity prediction from persistence fingerprints. We show that despite using 1,400-fold fewer features, PATH + has comparable performance to a previous state-of-the-art binding affinity prediction algorithm that uses persistent homology. Moreover, PATH + has the advantage of being interpretable. We visualize the features captured by persistence fingerprint for variant HIV-1 protease complexes and show that persistence fingerprint captures binding-relevant structural mutations. PATH - , in turn, uses regression trees over IPCs to differentiate between binding and decoy complexes. Finally, we benchmarked PATH versus established binding affinity prediction algorithms spanning physics-based, knowledge-based, and deep learning methods, revealing that PATH has comparable or better performance with less overfitting, compared to these state-of-the-art methods. The source code for PATH is released open-source as part of the osprey protein design software package.
Collapse
|
4
|
Gheeraert A, Bailly T, Ren Y, Hamraoui A, Te J, Vander Meersche Y, Cretin G, Leon Foun Lin R, Gelly JC, Pérez S, Guyon F, Galochkina T. DIONYSUS: a database of protein-carbohydrate interfaces. Nucleic Acids Res 2024:gkae890. [PMID: 39436020 DOI: 10.1093/nar/gkae890] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2024] [Revised: 09/03/2024] [Accepted: 09/26/2024] [Indexed: 10/23/2024] Open
Abstract
Protein-carbohydrate interactions govern a wide variety of biological processes and play an essential role in the development of different diseases. Here, we present DIONYSUS, the first database of protein-carbohydrate interfaces annotated according to structural, chemical and functional properties of both proteins and carbohydrates. We provide exhaustive information on the nature of interactions, binding site composition, biological function and specific additional information retrieved from existing databases. The user can easily search the database using protein sequence and structure information or by carbohydrate binding site properties. Moreover, for a given interaction site, the user can perform its comparison with a representative subset of non-covalent protein-carbohydrate interactions to retrieve information on its potential function or specificity. Therefore, DIONYSUS is a source of valuable information both for a deeper understanding of general protein-carbohydrate interaction patterns, for annotation of the previously unannotated proteins and for such applications as carbohydrate-based drug design. DIONYSUS is freely available at www.dsimb.inserm.fr/DIONYSUS/.
Collapse
Affiliation(s)
- Aria Gheeraert
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Thomas Bailly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Yani Ren
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
- Université Paris-Saclay, INRAE, MetaGenoPolis, 78350 Jouy-en-Josas, France
| | - Ali Hamraoui
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
- Institut de biologie de l'Ecole normale supérieure (IBENS), Ecole normale supérieure, CNRS, INSERM, PSL Universite Paris, 75005 Paris, France
| | - Julie Te
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Yann Vander Meersche
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Gabriel Cretin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Ravy Leon Foun Lin
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Jean-Christophe Gelly
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Serge Pérez
- Centre de Recherches sur les Macromolécules Végétales, University Grenoble Alpes, CNRS, UPR, 5301 Grenoble, France
| | - Frédéric Guyon
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| | - Tatiana Galochkina
- Université Paris Cité and Université des Antilles and Université de la Réunion, INSERM, BIGR, DSIMB, F-75015 Paris, France
| |
Collapse
|
5
|
Mohamed SF, Narayanan R. Enterobacter cloacae-mediated polymer biodegradation: in-silico analysis predicts broad spectrum degradation potential by Alkane monooxygenase. Biodegradation 2024; 35:969-991. [PMID: 39001975 DOI: 10.1007/s10532-024-10091-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2024] [Accepted: 07/03/2024] [Indexed: 07/15/2024]
Abstract
Plastic pollution poses a significant environmental challenge. In this study, the strain Enterobacter cloacae O5-E, a bacterium displaying polyethylene-degrading capabilities was isolated. Over a span of 30 days, analytical techniques including x-ray diffractometry, scanning electron microscopy, optical profilometry, hardness testing and mass spectrometric analysis were employed to examine alterations in the polymer. Results revealed an 11.48% reduction in crystallinity, a 50% decrease in hardness, and a substantial 25-fold increase in surface roughness resulting from the pits and cracks introduced in the polymer by the isolate. Additionally, the presence of degradational by-products revealed via gas chromatography ascertains the steady progression of degradation. Further, recognizing the pivotal role of alkane monooxygenase in plastic degradation, the study expanded to detect this enzyme in the isolate molecularly. Molecular docking studies were conducted to assess the enzyme's affinity with various polymers, demonstrating notable binding capability with most polymers, especially with polyurethane (- 5.47 kcal/mol). These findings highlight the biodegradation potential of Enterobacter cloacae O5-E and the crucial involvement of alkane monooxygenase in the initial steps of the degradation process, offering a promising avenue to address the global plastic pollution crisis.
Collapse
Affiliation(s)
- Shafana Farveen Mohamed
- Department of Genetic Engineering, School of Bioengineering and Faculty of Engineering and Technology, College of Engineering & Technology (CET), SRM Institute of Science and Technology, Kattankulathur, Kanchipuram, Chennai, Tamil Nadu, 603203, India
| | - Rajnish Narayanan
- Department of Genetic Engineering, School of Bioengineering and Faculty of Engineering and Technology, College of Engineering & Technology (CET), SRM Institute of Science and Technology, Kattankulathur, Kanchipuram, Chennai, Tamil Nadu, 603203, India.
| |
Collapse
|
6
|
Li Y, Nan X, Zhang S, Zhou Q, Lu S, Tian Z. PMSFF: Improved Protein Binding Residues Prediction through Multi-Scale Sequence-Based Feature Fusion Strategy. Biomolecules 2024; 14:1220. [PMID: 39456153 PMCID: PMC11506650 DOI: 10.3390/biom14101220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2024] [Revised: 09/22/2024] [Accepted: 09/24/2024] [Indexed: 10/28/2024] Open
Abstract
Proteins perform different biological functions through binding with various molecules which are mediated by a few key residues and accurate prediction of such protein binding residues (PBRs) is crucial for understanding cellular processes and for designing new drugs. Many computational prediction approaches have been proposed to identify PBRs with sequence-based features. However, these approaches face two main challenges: (1) these methods only concatenate residue feature vectors with a simple sliding window strategy, and (2) it is challenging to find a uniform sliding window size suitable for learning embeddings across different types of PBRs. In this study, we propose one novel framework that could apply multiple types of PBRs Prediciton task through Multi-scale Sequence-based Feature Fusion (PMSFF) strategy. Firstly, PMSFF employs a pre-trained language model named ProtT5, to encode amino acid residues in protein sequences. Then, it generates multi-scale residue embeddings by applying multi-size windows to capture effective neighboring residues and multi-size kernels to learn information across different scales. Additionally, the proposed model treats protein sequences as sentences, employing a bidirectional GRU to learn global context. We also collect benchmark datasets encompassing various PBRs types and evaluate our PMSFF approach to these datasets. Compared with state-of-the-art methods, PMSFF demonstrates superior performance on most PBRs prediction tasks.
Collapse
Affiliation(s)
- Yuguang Li
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Xiaofei Nan
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shoutao Zhang
- School of Life Sciences, Zhengzhou University, Zhengzhou 450001, China;
- Longhu Laboratory of Advanced Immunology, Zhengzhou 450001, China
| | - Qinglei Zhou
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
| | - Shuai Lu
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- National Supercomputing Center in Zhengzhou, Zhengzhou University, Zhengzhou 450001, China
| | - Zhen Tian
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou 450001, China; (Y.L.); (X.N.); (Q.Z.)
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou 324003, China
| |
Collapse
|
7
|
Song Y, Yuan Q, Chen S, Zeng Y, Zhao H, Yang Y. Accurately predicting enzyme functions through geometric graph learning on ESMFold-predicted structures. Nat Commun 2024; 15:8180. [PMID: 39294165 PMCID: PMC11411130 DOI: 10.1038/s41467-024-52533-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2024] [Accepted: 09/11/2024] [Indexed: 09/20/2024] Open
Abstract
Enzymes are crucial in numerous biological processes, with the Enzyme Commission (EC) number being a commonly used method for defining enzyme function. However, current EC number prediction technologies have not fully recognized the importance of enzyme active sites and structural characteristics. Here, we propose GraphEC, a geometric graph learning-based EC number predictor using the ESMFold-predicted structures and a pre-trained protein language model. Specifically, we first construct a model to predict the enzyme active sites, which is utilized to predict the EC number. The prediction is further improved through a label diffusion algorithm by incorporating homology information. In parallel, the optimum pH of enzymes is predicted to reflect the enzyme-catalyzed reactions. Experiments demonstrate the superior performance of our model in predicting active sites, EC numbers, and optimum pH compared to other state-of-the-art methods. Additional analysis reveals that GraphEC is capable of extracting functional information from protein structures, emphasizing the effectiveness of geometric graph learning. This technology can be used to identify unannotated enzyme functions, as well as to predict their active sites and optimum pH, with the potential to advance research in synthetic biology, genomics, and other fields.
Collapse
Affiliation(s)
- Yidong Song
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China
- High Performance Computing Department, National Supercomputing Center in Shenzhen, Shenzhen, Guangdong, China
| | - Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Yuansong Zeng
- School of Big Data & Software Engineering, Chongqing University, Chongqing, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, Guangdong, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, Guangdong, China.
- Key Laboratory of Machine Intelligence and Advanced Computing (MOE), Guangzhou, China.
| |
Collapse
|
8
|
Shafiee S, Fathi A, Taherzadeh G. DP-site: A dual deep learning-based method for protein-peptide interaction site prediction. Methods 2024; 229:17-29. [PMID: 38871095 DOI: 10.1016/j.ymeth.2024.06.001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2024] [Revised: 04/22/2024] [Accepted: 06/01/2024] [Indexed: 06/15/2024] Open
Abstract
BACKGROUND Protein-peptide interaction prediction is an important topic for several applications including various biological processes, understanding drug discovery, protein function abnormal cellular behaviors, and treating diseases. Over the years, studies have shown that experimental methods have improved the identification of this bio-molecular interaction. However, predicting protein-peptide interactions using these methods is laborious, time-consuming, dependent on third-party tools, and costly. METHOD To address these previous drawbacks, this study introduces a computational framework called DP-Site. The proposed framework concentrates on using a compound of a dual pipeline along with a combination predictor. A deep convolutional neural network for feature extraction and classification is embedded in pipeline 1. In addition, pipeline 2 includes a deep long-short-term memory-based and a random forest classifier for feature extraction and classification. In this investigation, the evolutionary, structure-based, sequence-based, and physicochemical information of proteins is utilized for identifying protein-peptide interaction at the residue level. RESULTS The proposed method is evaluated on both the ten-fold cross-validation and independent test sets. The robust and consistent results between cross-validation and independent test sets confirm the ability of the proposed method to predict peptide binding residues in proteins. Moreover, experimental findings demonstrate that DP-Site has significantly outperformed other state-of-the-art sequence-based and structure-based methods. The proposed method achieves a remarkable balance between a specificity of 0.799 and a sensitivity of 0.770, along with the best f-measure of 0.661 and the highest precision of 0.580 using an independent test set. CONCLUSIONS The outcome of various experiments confirms the proficiency of the proposed method and outperforms state-of-the-art sequence-based and structure-based methods in terms of the mentioned criteria. DP-Site can be accessed at https://github.com/shafiee 95/shima.shafiee.DP-Site.
Collapse
Affiliation(s)
- Shima Shafiee
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
| | - Abdolhossein Fathi
- Department of Computer Engineering and Information Technology, Razi University, Kermanshah, Iran.
| | - Ghazaleh Taherzadeh
- Department of Math, Physics, and Computer Science, Wilkes University, Pennsylvania, USA.
| |
Collapse
|
9
|
Liu YC, Lin YJ, Chang YY, Chuang CC, Ou YY. Deciphering the Language of Protein-DNA Interactions: A Deep Learning Approach Combining Contextual Embeddings and Multi-Scale Sequence Modeling. J Mol Biol 2024; 436:168769. [PMID: 39214282 DOI: 10.1016/j.jmb.2024.168769] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2024] [Revised: 08/01/2024] [Accepted: 08/26/2024] [Indexed: 09/04/2024]
Abstract
Deciphering the mechanisms governing protein-DNA interactions is crucial for understanding key cellular processes and disease pathways. In this work, we present a powerful deep learning approach that significantly advances the computational prediction of DNA-interacting residues from protein sequences. Our method leverages the rich contextual representations learned by pre-trained protein language models, such as ProtTrans, to capture intrinsic biochemical properties and sequence motifs indicative of DNA binding sites. We then integrate these contextual embeddings with a multi-window convolutional neural network architecture, which scans across the sequence at varying window sizes to effectively identify both local and global binding patterns. Comprehensive evaluation on curated benchmark datasets demonstrates the remarkable performance of our approach, achieving an area under the ROC curve (AUC) of 0.89 - a substantial improvement over previous state-of-the-art sequence-based predictors. This showcases the immense potential of pairing advanced representation learning and deep neural network designs for uncovering the complex syntax governing protein-DNA interactions directly from primary sequences. Our work not only provides a robust computational tool for characterizing DNA-binding mechanisms, but also highlights the transformative opportunities at the intersection of language modeling, deep learning, and protein sequence analysis. The publicly available code and data further facilitate broader adoption and continued development of these techniques for accelerating mechanistic insights into vital biological processes and disease pathways. In addition, the code and data for this work are available at https://github.com/B1607/DIRP.
Collapse
Affiliation(s)
- Yu-Chen Liu
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yi-Jing Lin
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yan-Yun Chang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Cheng-Che Chuang
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan
| | - Yu-Yen Ou
- Department of Computer Science and Engineering, Yuan Ze University, Chung-Li 32003, Taiwan; Graduate Program in Biomedical Informatics, Yuan Ze University, Chung-Li 32003, Taiwan.
| |
Collapse
|
10
|
Wang B, Li W. Advances in the Application of Protein Language Modeling for Nucleic Acid Protein Binding Site Prediction. Genes (Basel) 2024; 15:1090. [PMID: 39202449 PMCID: PMC11353971 DOI: 10.3390/genes15081090] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2024] [Revised: 08/13/2024] [Accepted: 08/14/2024] [Indexed: 09/03/2024] Open
Abstract
Protein and nucleic acid binding site prediction is a critical computational task that benefits a wide range of biological processes. Previous studies have shown that feature selection holds particular significance for this prediction task, making the generation of more discriminative features a key area of interest for many researchers. Recent progress has shown the power of protein language models in handling protein sequences, in leveraging the strengths of attention networks, and in successful applications to tasks such as protein structure prediction. This naturally raises the question of the applicability of protein language models in predicting protein and nucleic acid binding sites. Various approaches have explored this potential. This paper first describes the development of protein language models. Then, a systematic review of the latest methods for predicting protein and nucleic acid binding sites is conducted by covering benchmark sets, feature generation methods, performance comparisons, and feature ablation studies. These comparisons demonstrate the importance of protein language models for the prediction task. Finally, the paper discusses the challenges of protein and nucleic acid binding site prediction and proposes possible research directions and future trends. The purpose of this survey is to furnish researchers with actionable suggestions for comprehending the methodologies used in predicting protein-nucleic acid binding sites, fostering the creation of protein-centric language models, and tackling real-world obstacles encountered in this field.
Collapse
Affiliation(s)
| | - Wenjin Li
- Institute for Advanced Study, Shenzhen University, Shenzhen 518061, China;
| |
Collapse
|
11
|
Jang YJ, Qin QQ, Huang SY, Peter ATJ, Ding XM, Kornmann B. Accurate prediction of protein function using statistics-informed graph networks. Nat Commun 2024; 15:6601. [PMID: 39097570 PMCID: PMC11297950 DOI: 10.1038/s41467-024-50955-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 07/15/2024] [Indexed: 08/05/2024] Open
Abstract
Understanding protein function is pivotal in comprehending the intricate mechanisms that underlie many crucial biological activities, with far-reaching implications in the fields of medicine, biotechnology, and drug development. However, more than 200 million proteins remain uncharacterized, and computational efforts heavily rely on protein structural information to predict annotations of varying quality. Here, we present a method that utilizes statistics-informed graph networks to predict protein functions solely from its sequence. Our method inherently characterizes evolutionary signatures, allowing for a quantitative assessment of the significance of residues that carry out specific functions. PhiGnet not only demonstrates superior performance compared to alternative approaches but also narrows the sequence-function gap, even in the absence of structural information. Our findings indicate that applying deep learning to evolutionary data can highlight functional sites at the residue level, providing valuable support for interpreting both existing properties and new functionalities of proteins in research and biomedicine.
Collapse
Affiliation(s)
- Yaan J Jang
- Department of Biochemistry, University of Oxford, Oxford, UK.
- AmoAi Technologies, Oxford, UK.
| | - Qi-Qi Qin
- AmoAi Technologies, Oxford, UK
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Si-Yu Huang
- AmoAi Technologies, Oxford, UK
- Oxford Martin School, University of Oxford, Oxford, UK
- School of Systems Science, Beijing Normal University, Beijing, China
| | | | - Xue-Ming Ding
- School of Optical-Electrical and Computer Engineering, University of Shanghai for Science and Technology, Shanghai, China
| | - Benoît Kornmann
- Department of Biochemistry, University of Oxford, Oxford, UK.
| |
Collapse
|
12
|
Chen L, Li Q, Nasif KFA, Xie Y, Deng B, Niu S, Pouriyeh S, Dai Z, Chen J, Xie CY. AI-Driven Deep Learning Techniques in Protein Structure Prediction. Int J Mol Sci 2024; 25:8426. [PMID: 39125995 PMCID: PMC11313475 DOI: 10.3390/ijms25158426] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2024] [Revised: 07/29/2024] [Accepted: 07/29/2024] [Indexed: 08/12/2024] Open
Abstract
Protein structure prediction is important for understanding their function and behavior. This review study presents a comprehensive review of the computational models used in predicting protein structure. It covers the progression from established protein modeling to state-of-the-art artificial intelligence (AI) frameworks. The paper will start with a brief introduction to protein structures, protein modeling, and AI. The section on established protein modeling will discuss homology modeling, ab initio modeling, and threading. The next section is deep learning-based models. It introduces some state-of-the-art AI models, such as AlphaFold (AlphaFold, AlphaFold2, AlphaFold3), RoseTTAFold, ProteinBERT, etc. This section also discusses how AI techniques have been integrated into established frameworks like Swiss-Model, Rosetta, and I-TASSER. The model performance is compared using the rankings of CASP14 (Critical Assessment of Structure Prediction) and CASP15. CASP16 is ongoing, and its results are not included in this review. Continuous Automated Model EvaluatiOn (CAMEO) complements the biennial CASP experiment. Template modeling score (TM-score), global distance test total score (GDT_TS), and Local Distance Difference Test (lDDT) score are discussed too. This paper then acknowledges the ongoing difficulties in predicting protein structure and emphasizes the necessity of additional searches like dynamic protein behavior, conformational changes, and protein-protein interactions. In the application section, this paper introduces some applications in various fields like drug design, industry, education, and novel protein development. In summary, this paper provides a comprehensive overview of the latest advancements in established protein modeling and deep learning-based models for protein structure predictions. It emphasizes the significant advancements achieved by AI and identifies potential areas for further investigation.
Collapse
Affiliation(s)
- Lingtao Chen
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Qiaomu Li
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Kazi Fahim Ahmad Nasif
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Ying Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Bobin Deng
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Shuteng Niu
- Department of Computer Science, Bowling Green State University, Bowling Green, OH 43403, USA;
| | - Seyedamin Pouriyeh
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| | - Zhiyu Dai
- Division of Pulmonary and Critical Care Medicine, John T. Milliken Department of Medicine, Washington University School of Medicine in St. Louis, St. Louis, MO 63110, USA;
| | - Jiawei Chen
- College of Computing, Data Science and Society, University of California, Berkeley, CA 94720, USA;
| | - Chloe Yixin Xie
- College of Computing and Software Engineering, Kennesaw State University, Marietta, GA 30060, USA; (L.C.); (Q.L.); (K.F.A.N.); (Y.X.); (B.D.); (S.P.)
| |
Collapse
|
13
|
Mustafov D, Siddiqui SS, Kukol A, Lambrou GI, Shagufta, Ahmad I, Braoudaki M. MicroRNA-Dependent Mechanisms Underlying the Function of a β-Amino Carbonyl Compound in Glioblastoma Cells. ACS OMEGA 2024; 9:31789-31802. [PMID: 39072119 PMCID: PMC11270567 DOI: 10.1021/acsomega.4c02991] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 03/28/2024] [Revised: 06/10/2024] [Accepted: 06/18/2024] [Indexed: 07/30/2024]
Abstract
Glioblastoma (GB) is an aggressive brain malignancy characterized by its invasive nature. Current treatment has limited effectiveness, resulting in poor patients' prognoses. β-Amino carbonyl (β-AC) compounds have gained attention due to their potential anticancerous properties. In vitro assays were performed to evaluate the effects of an in-house synthesized β-AC compound, named SHG-8, upon GB cells. Small RNA sequencing (sRNA-seq) and biocomputational analyses investigated the effects of SHG-8 upon the miRNome and its bioavailability within the human body. SHG-8 exhibited significant cytotoxicity and inhibition of cell migration and proliferation in U87MG and U251MG GB cells. GB cells treated with the compound released significant amounts of reactive oxygen species (ROS). Annexin V and acridine orange/ethidium bromide staining also demonstrated that the compound led to apoptosis. sRNA-seq revealed a shift in microRNA (miRNA) expression profiles upon SHG-8 treatment and significant upregulation of miR-3648 and downregulation of miR-7973. Real-time polymerase chain reaction (RT-qPCR) demonstrated a significant downregulation of CORO1C, an oncogene and a player in the Wnt/β-catenin pathway. In silico analysis indicated SHG-8's potential to cross the blood-brain barrier. We concluded that SHG-8's inhibitory effects on GB cells may involve the deregulation of various miRNAs and the inhibition of CORO1C.
Collapse
Affiliation(s)
- Denis Mustafov
- School
of Life and Medical Sciences, University
of Hertfordshire, Hatfield, AL10 9AB, United
Kingdom
- College
of Health, Medicine and Life Sciences, Brunel
University London, Uxbridge UB8 3PH, United
Kingdom
| | - Shoib S. Siddiqui
- School
of Life and Medical Sciences, University
of Hertfordshire, Hatfield, AL10 9AB, United
Kingdom
| | - Andreas Kukol
- School
of Life and Medical Sciences, University
of Hertfordshire, Hatfield, AL10 9AB, United
Kingdom
| | - George I. Lambrou
- Choremeio
Research Laboratory, First Department of Pediatrics, School of Medicine, National and Kapodistrian University of Athens, Athens,
Greece, Thivon and Levadeias
8, Goudi, 11527 Athens, Greece
- University
Research Institute of Maternal and Child Health and Precision Medicine, National and Kapodistrian University of Athens, Thivon and Levadeias 8, 11527 Athens, Greece
| | - Shagufta
- Department
of Biotechnology, School of Arts and Sciences, American University of Ras Al Khaimah, Ras Al Khaimah, United Arab
Emirates
| | - Irshad Ahmad
- Department
of Biotechnology, School of Arts and Sciences, American University of Ras Al Khaimah, Ras Al Khaimah, United Arab
Emirates
| | - Maria Braoudaki
- School
of Life and Medical Sciences, University
of Hertfordshire, Hatfield, AL10 9AB, United
Kingdom
- University
Research Institute of Maternal and Child Health and Precision Medicine, National and Kapodistrian University of Athens, Thivon and Levadeias 8, 11527 Athens, Greece
| |
Collapse
|
14
|
Jeevan K, Palistha S, Tayara H, Chong KT. PUResNetV2.0: a deep learning model leveraging sparse representation for improved ligand binding site prediction. J Cheminform 2024; 16:66. [PMID: 38849917 PMCID: PMC11157904 DOI: 10.1186/s13321-024-00865-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/29/2024] [Accepted: 05/27/2024] [Indexed: 06/09/2024] Open
Abstract
Accurate ligand binding site prediction (LBSP) within proteins is essential for drug discovery. We developed ProteinUNetResNetV2.0 (PUResNetV2.0), leveraging sparse representation of protein structures to improve LBSP accuracy. Our training dataset included protein complexes from 4729 protein families. Evaluations on benchmark datasets showed that PUResNetV2.0 achieved an 85.4% Distance Center Atom (DCA) success rate and a 74.7% F1 Score on the Holo801 dataset, outperforming existing methods. However, its performance in specific cases, such as RNA, DNA, peptide-like ligand, and ion binding site prediction, was limited due to constraints in our training data. Our findings underscore the potential of sparse representation in LBSP, especially for oligomeric structures, suggesting PUResNetV2.0 as a promising tool for computational drug discovery.
Collapse
Affiliation(s)
- Kandel Jeevan
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Shrestha Palistha
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Hilal Tayara
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Kil T Chong
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea.
- Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.
- School of International Engineering and Science, Jeonbuk National University, Jeonju, 54896, South Korea.
- Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea.
| |
Collapse
|
15
|
Zhang B, Hou Z, Yang Y, Wong KC, Zhu H, Li X. SOFB is a comprehensive ensemble deep learning approach for elucidating and characterizing protein-nucleic-acid-binding residues. Commun Biol 2024; 7:679. [PMID: 38830995 PMCID: PMC11148103 DOI: 10.1038/s42003-024-06332-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2023] [Accepted: 05/15/2024] [Indexed: 06/05/2024] Open
Abstract
Proteins and nucleic-acids are essential components of living organisms that interact in critical cellular processes. Accurate prediction of nucleic acid-binding residues in proteins can contribute to a better understanding of protein function. However, the discrepancy between protein sequence information and obtained structural and functional data renders most current computational models ineffective. Therefore, it is vital to design computational models based on protein sequence information to identify nucleic acid binding sites in proteins. Here, we implement an ensemble deep learning model-based nucleic-acid-binding residues on proteins identification method, called SOFB, which characterizes protein sequences by learning the semantics of biological dynamics contexts, and then develop an ensemble deep learning-based sequence network to learn feature representation and classification by explicitly modeling dynamic semantic information. Among them, the language learning model, which is constructed from natural language to biological language, captures the underlying relationships of protein sequences, and the ensemble deep learning-based sequence network consisting of different convolutional layers together with Bi-LSTM refines various features for optimal performance. Meanwhile, to address the imbalanced issue, we adopt ensemble learning to train multiple models and then incorporate them. Our experimental results on several DNA/RNA nucleic-acid-binding residue datasets demonstrate that our proposed model outperforms other state-of-the-art methods. In addition, we conduct an interpretability analysis of the identified nucleic acid binding residue sequences based on the attention weights of the language learning model, revealing novel insights into the dynamic semantic information that supports the identified nucleic acid binding residues. SOFB is available at https://github.com/Encryptional/SOFB and https://figshare.com/articles/online_resource/SOFB_figshare_rar/25499452 .
Collapse
Affiliation(s)
- Bin Zhang
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Zilong Hou
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Yuning Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Canada
| | - Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Hong Kong, Hong Kong SAR
| | - Haoran Zhu
- School of Artificial Intelligence, Jilin University, Changchun, China.
| | - Xiangtao Li
- School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|
16
|
Xia Y, Pan X, Shen HB. A comprehensive survey on protein-ligand binding site prediction. Curr Opin Struct Biol 2024; 86:102793. [PMID: 38447285 DOI: 10.1016/j.sbi.2024.102793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2023] [Revised: 02/18/2024] [Accepted: 02/18/2024] [Indexed: 03/08/2024]
Abstract
Protein-ligand binding site prediction is critical for protein function annotation and drug discovery. Biological experiments are time-consuming and require significant equipment, materials, and labor resources. Developing accurate and efficient computational methods for protein-ligand interaction prediction is essential. Here, we summarize the key challenges associated with ligand binding site (LBS) prediction and introduce recently published methods from their input features, computational algorithms, and ligand types. Furthermore, we investigate the specificity of allosteric site identification as a particular LBS type. Finally, we discuss the prospective directions for machine learning-based LBS prediction in the near future.
Collapse
Affiliation(s)
- Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai 200240, China.
| |
Collapse
|
17
|
Zhang H, Fan H, Wang J, Hou T, Saravanan KM, Xia W, Kan HW, Li J, Zhang JZH, Liang X, Chen Y. Revolutionizing GPCR-ligand predictions: DeepGPCR with experimental validation for high-precision drug discovery. Brief Bioinform 2024; 25:bbae281. [PMID: 38864340 PMCID: PMC11167311 DOI: 10.1093/bib/bbae281] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2024] [Revised: 05/05/2024] [Accepted: 05/29/2024] [Indexed: 06/13/2024] Open
Abstract
G-protein coupled receptors (GPCRs), crucial in various diseases, are targeted of over 40% of approved drugs. However, the reliable acquisition of experimental GPCRs structures is hindered by their lipid-embedded conformations. Traditional protein-ligand interaction models falter in GPCR-drug interactions, caused by limited and low-quality structures. Generalized models, trained on soluble protein-ligand pairs, are also inadequate. To address these issues, we developed two models, DeepGPCR_BC for binary classification and DeepGPCR_RG for affinity prediction. These models use non-structural GPCR-ligand interaction data, leveraging graph convolutional networks and mol2vec techniques to represent binding pockets and ligands as graphs. This approach significantly speeds up predictions while preserving critical physical-chemical and spatial information. In independent tests, DeepGPCR_BC surpassed Autodock Vina and Schrödinger Dock with an area under the curve of 0.72, accuracy of 0.68 and true positive rate of 0.73, whereas DeepGPCR_RG demonstrated a Pearson correlation of 0.39 and root mean squared error of 1.34. We applied these models to screen drug candidates for GPR35 (Q9HC97), yielding promising results with three (F545-1970, K297-0698, S948-0241) out of eight candidates. Furthermore, we also successfully obtained six active inhibitors for GLP-1R. Our GPCR-specific models pave the way for efficient and accurate large-scale virtual screening, potentially revolutionizing drug discovery in the GPCR field.
Collapse
Affiliation(s)
- Haiping Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hongjie Fan
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
| | - Jixia Wang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Tao Hou
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Konda Mani Saravanan
- Department of Biotechnology, Bharath Institute of Higher Education and Research, Agharam Road 173, Selaiyur, Chennai, Tamil Nadu 600073, India
| | - Wei Xia
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Hei Wun Kan
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Junxin Li
- Shenzhen Laboratory of Human Antibody Engineering, Institute of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - John Z H Zhang
- Faculty of Synthetic Biology and Institute of Synthetic Biology, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, No. 1068 Xueyuan Boulevard, Nanshan District, Shenzhen 518055, Guangdong Province, China
| | - Xinmiao Liang
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| | - Yang Chen
- Ganjiang Chinese Medicine Innovation Center, Xinqizhou East Road 888, Ganjiang New Area, Nanchang 330000, China
- CAS Key Laboratory of Separation Science for Analytical Chemistry, Dalian Institute of Chemical Physics, Chinese Academy of Sciences, No. 457 Zhongshan Road, Dalian 116023, China
| |
Collapse
|
18
|
Wei H, Wang W, Peng Z, Yang J. Q-BioLiP: A Comprehensive Resource for Quaternary Structure-based Protein-ligand Interactions. GENOMICS, PROTEOMICS & BIOINFORMATICS 2024; 22:qzae001. [PMID: 38862427 PMCID: PMC11423850 DOI: 10.1093/gpbjnl/qzae001] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2023] [Revised: 11/12/2023] [Accepted: 12/03/2023] [Indexed: 06/13/2024]
Abstract
Since its establishment in 2013, BioLiP has become one of the widely used resources for protein-ligand interactions. Nevertheless, several known issues occurred with it over the past decade. For example, the protein-ligand interactions are represented in the form of single chain-based tertiary structures, which may be inappropriate as many interactions involve multiple protein chains (known as quaternary structures). We sought to address these issues, resulting in Q-BioLiP, a comprehensive resource for quaternary structure-based protein-ligand interactions. The major features of Q-BioLiP include: (1) representing protein structures in the form of quaternary structures rather than single chain-based tertiary structures; (2) pairing DNA/RNA chains properly rather than separation; (3) providing both experimental and predicted binding affinities; (4) retaining both biologically relevant and irrelevant interactions to alleviate the wrong justification of ligands' biological relevance; and (5) developing a new quaternary structure-based algorithm for the modelling of protein-ligand complex structure. With these new features, Q-BioLiP is expected to be a valuable resource for studying biomolecule interactions, including protein-small molecule interaction, protein-metal ion interaction, protein-peptide interaction, protein-protein interaction, protein-DNA/RNA interaction, and RNA-small molecule interaction. Q-BioLiP is freely available at https://yanglab.qd.sdu.edu.cn/Q-BioLiP/.
Collapse
Affiliation(s)
- Hong Wei
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Wenkai Wang
- School of Mathematical Sciences, Nankai University, Tianjin 300071, China
| | - Zhenling Peng
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| | - Jianyi Yang
- MOE Frontiers Science Center for Nonlinear Expectations, Research Center for Mathematics and Interdisciplinary Sciences, Shandong University, Qingdao 266237, China
| |
Collapse
|
19
|
Yin S, Mi X, Shukla D. Leveraging machine learning models for peptide-protein interaction prediction. RSC Chem Biol 2024; 5:401-417. [PMID: 38725911 PMCID: PMC11078210 DOI: 10.1039/d3cb00208j] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2023] [Accepted: 02/07/2024] [Indexed: 05/12/2024] Open
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as docking and molecular dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign Urbana 61801 Illinois USA
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign Urbana IL 61801 USA
- Department of Bioengineering, University of Illinois Urbana-Champaign Urbana IL 61801 USA
| |
Collapse
|
20
|
Shanthappa PM, Suravajhala R, Kumar G, Melethadathil N. Computational exploration of novel antimicrobial modalities targeting fucose-binding lectins and ribosomes in Mycobacterium smegmatis using tRNA-encoded peptides. J Biomol Struct Dyn 2024:1-13. [PMID: 38676533 DOI: 10.1080/07391102.2024.2335555] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 03/19/2024] [Indexed: 04/29/2024]
Abstract
tRNA-Encoded Peptides (tREPs), encoded by small open reading frames (smORFs) within tRNA genes, have recently emerged as a new class of functional peptides exhibiting antiparasitic activity. The discovery of tREPs has led to a re-evaluation of the role of tRNAs in biology and has expanded our understanding of the genetic code. This presents an immense, unexplored potential in the realm of tRNA-peptide interactions, paving the way for groundbreaking discoveries and innovative applications in various biological functions. This study explores the antimicrobial potential of tREPs against protein targets by employing a computational method that uses verified data sources and highly recognized predictive algorithms to provide a sorted list of likely antimicrobial peptides, which were then filtered for toxicity, cell permeability, allergenicity and half-life. These peptides were then docked with screened protein targets and computationally validated using molecular dynamics (MD) simulations for 150 ns and the binding free energy was estimated. The peptides Pep2 (VVLWRKPRVRKTG) and Pep6 (HRLRLRRRKPWW) exhibited good binding affinities of -110.5 +/- 2.5 and -129.0 +/- 3.9, respectively, with RMSD values of 0.4 and 0.25 nm against the fucose-binding lectin (7NEF) and the 30S ribosome of Mycobacterium smegmatis (5O5J) protein targets. The 7NEF-Pep2 and 5O5J-Pep6 complexes indicated higher negative binding free energies of -52.55 kcal/mol and -55.52 kcal/mol respectively, as calculated by Molecular Mechanics Poisson-Boltzmann Surface Area (MMPBSA). Thus, the tREPs derived peptides designed as a part of this study, provide novel approaches for potential anti-bacterial therapeutic modalities.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Pallavi M Shanthappa
- Department of Computer Science, School of Computing, Amrita Vishwa Vidyapeetham, Mysuru, India
| | | | - Geetha Kumar
- School of Biotechnology, Amrita Vishwa Vidyapeetham, Amritapuri, India
| | | |
Collapse
|
21
|
Krishna R, Wang J, Ahern W, Sturmfels P, Venkatesh P, Kalvet I, Lee GR, Morey-Burrows FS, Anishchenko I, Humphreys IR, McHugh R, Vafeados D, Li X, Sutherland GA, Hitchcock A, Hunter CN, Kang A, Brackenbrough E, Bera AK, Baek M, DiMaio F, Baker D. Generalized biomolecular modeling and design with RoseTTAFold All-Atom. Science 2024; 384:eadl2528. [PMID: 38452047 DOI: 10.1126/science.adl2528] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2023] [Accepted: 02/27/2024] [Indexed: 03/09/2024]
Abstract
Deep-learning methods have revolutionized protein structure prediction and design but are presently limited to protein-only systems. We describe RoseTTAFold All-Atom (RFAA), which combines a residue-based representation of amino acids and DNA bases with an atomic representation of all other groups to model assemblies that contain proteins, nucleic acids, small molecules, metals, and covalent modifications, given their sequences and chemical structures. By fine-tuning on denoising tasks, we developed RFdiffusion All-Atom (RFdiffusionAA), which builds protein structures around small molecules. Starting from random distributions of amino acid residues surrounding target small molecules, we designed and experimentally validated, through crystallography and binding measurements, proteins that bind the cardiac disease therapeutic digoxigenin, the enzymatic cofactor heme, and the light-harvesting molecule bilin.
Collapse
Affiliation(s)
- Rohith Krishna
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Jue Wang
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Woody Ahern
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98105, USA
| | - Pascal Sturmfels
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98105, USA
| | - Preetham Venkatesh
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA 98105, USA
| | - Indrek Kalvet
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| | - Gyu Rie Lee
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| | | | - Ivan Anishchenko
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Ian R Humphreys
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Ryan McHugh
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Graduate Program in Biological Physics, Structure and Design, University of Washington, Seattle, WA 98105, USA
| | - Dionne Vafeados
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Xinting Li
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | | | - Andrew Hitchcock
- School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - C Neil Hunter
- School of Biosciences, University of Sheffield, Sheffield S10 2TN, UK
| | - Alex Kang
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Evans Brackenbrough
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Asim K Bera
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - Minkyung Baek
- School of Biological Sciences, Seoul National University, Seoul 08826, Republic of Korea
| | - Frank DiMaio
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Seattle, WA 98105, USA
| |
Collapse
|
22
|
MacGowan SA, Madeira F, Britto-Borges T, Barton GJ. A unified analysis of evolutionary and population constraint in protein domains highlights structural features and pathogenic sites. Commun Biol 2024; 7:447. [PMID: 38605212 PMCID: PMC11009406 DOI: 10.1038/s42003-024-06117-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2023] [Accepted: 03/27/2024] [Indexed: 04/13/2024] Open
Abstract
Protein evolution is constrained by structure and function, creating patterns in residue conservation that are routinely exploited to predict structure and other features. Similar constraints should affect variation across individuals, but it is only with the growth of human population sequencing that this has been tested at scale. Now, human population constraint has established applications in pathogenicity prediction, but it has not yet been explored for structural inference. Here, we map 2.4 million population variants to 5885 protein families and quantify residue-level constraint with a new Missense Enrichment Score (MES). Analysis of 61,214 structures from the PDB spanning 3661 families shows that missense depleted sites are enriched in buried residues or those involved in small-molecule or protein binding. MES is complementary to evolutionary conservation and a combined analysis allows a new classification of residues according to a conservation plane. This approach finds functional residues that are evolutionarily diverse, which can be related to specificity, as well as family-wide conserved sites that are critical for folding or function. We also find a possible contrast between lethal and non-lethal pathogenic sites, and a surprising clinical variant hot spot at a subset of missense enriched positions.
Collapse
Affiliation(s)
- Stuart A MacGowan
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
| | - Fábio Madeira
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
| | - Thiago Britto-Borges
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK
- Section of Bioinformatics and Systems Cardiology, Department of Internal Medicine III and Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg University Hospital, Heidelberg, Germany
| | - Geoffrey J Barton
- Division of Computational Biology School of Life Sciences University of Dundee, Dow Street Dundee, DD1 5EH, Scotland, UK.
| |
Collapse
|
23
|
Li X, Shen C, Zhu H, Yang Y, Wang Q, Yang J, Huang N. A High-Quality Data Set of Protein-Ligand Binding Interactions Via Comparative Complex Structure Modeling. J Chem Inf Model 2024; 64:2454-2466. [PMID: 38181418 DOI: 10.1021/acs.jcim.3c01170] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2024]
Abstract
High-quality protein-ligand complex structures provide the basis for understanding the nature of noncovalent binding interactions at the atomic level and enable structure-based drug design. However, experimentally determined complex structures are scarce compared with the vast chemical space. In this study, we addressed this issue by constructing the BindingNet data set via comparative complex structure modeling, which contains 69,816 modeled high-quality protein-ligand complex structures with experimental binding affinity data. BindingNet provides valuable insights into investigating protein-ligand interactions, allowing visual inspection and interpretation of structural analogues' structure-activity relationships. It can also be used for evaluating machine-learning-based scoring functions. Our results indicate that machine learning models trained on BindingNet could reduce the bias caused by buried solvent-accessible surface area, as we previously found for models trained on the PDBbind data set. We also discussed strategies to improve BindingNet and its potential utilization for benchmarking the molecular docking methods and ligand binding free energy calculation approaches. The BindingNet complements PDBbind in constructing a sufficient and unbiased protein-ligand binding data set and is freely available at http://bindingnet.huanglab.org.cn.
Collapse
Affiliation(s)
- Xuelian Li
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Cheng Shen
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Hui Zhu
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| | - Yujian Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Qing Wang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Jincai Yang
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
| | - Niu Huang
- National Institute of Biological Sciences, Chinese Academy of Medical Sciences & Peking Union Medical College, Beijing 100730, China
- National Institute of Biological Sciences, 7 Science Park Road, Zhongguancun Life Science Park, Beijing 102206, China
- Tsinghua Institute of Multidisciplinary Biomedical Research, Tsinghua University, Beijing 102206, China
| |
Collapse
|
24
|
Zhao Y, Yang Z, Wang L, Zhang Y, Lin H, Wang J. Predicting Protein Functions Based on Heterogeneous Graph Attention Technique. IEEE J Biomed Health Inform 2024; 28:2408-2415. [PMID: 38319781 DOI: 10.1109/jbhi.2024.3357834] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2024]
Abstract
In bioinformatics, protein function prediction stands as a fundamental area of research and plays a crucial role in addressing various biological challenges, such as the identification of potential targets for drug discovery and the elucidation of disease mechanisms. However, known functional annotation databases usually provide positive experimental annotations that proteins carry out a given function, and rarely record negative experimental annotations that proteins do not carry out a given function. Therefore, existing computational methods based on deep learning models focus on these positive annotations for prediction and ignore these scarce but informative negative annotations, leading to an underestimation of precision. To address this issue, we introduce a deep learning method that utilizes a heterogeneous graph attention technique. The method first constructs a heterogeneous graph that covers the protein-protein interaction network, ontology structure, and positive and negative annotation information. Then, it learns embedding representations of proteins and ontology terms by using the heterogeneous graph attention technique. Finally, it leverages these learned representations to reconstruct the positive protein-term associations and score unobserved functional annotations. It can enhance the predictive performance by incorporating these known limited negative annotations into the constructed heterogeneous graph. Experimental results on three species (i.e., Human, Mouse, and Arabidopsis) demonstrate that our method can achieve better performance in predicting new protein annotations than state-of-the-art methods.
Collapse
|
25
|
Singh RK, Chaurasiya AK, Kumar A. Ab initio modeling of human IRS1 protein to find novel target to dock with drug MH to mitigate T2DM diabetes by insulin signaling. 3 Biotech 2024; 14:108. [PMID: 38476643 PMCID: PMC10925585 DOI: 10.1007/s13205-024-03955-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2023] [Accepted: 02/03/2024] [Indexed: 03/14/2024] Open
Abstract
IRS1 is a cytoplasmic adaptor protein that helps in cellular growth, glucose metabolism, proliferation, and differentiation. Highly disordered (insulin receptor substrate 1) IRS1 protein sequence (mol.wt- 131,590.97 da) has been used to develop model using ab initio modeling technique by I-Tassar tool and Discovery Studio/ DogSite Server to decipher a novel active site. The constructed protein model has been submitted with PMDB Id- PM0082210. GRAVY index of IRS1 model ( - 0.675) indicated surface protein-water interaction. Protparam tool instability index (75.22) demonstrated disorderedness combined with loops owing to prolines/glycines. After refinement, the Ramachandran plot showed that 88 percent of AAs were present in the allowed region and only 0.5% in the disallowed region. Novel IRS1 model protein has 10 α-helices, 22 β-sheets, 20 β-hairpins, 5 β-bulges, 47 strands, 105 β-turns, and 8 γ-turns. Docking of IRS1 with drug MH demonstrated interaction of Ser-70, Thr-18, and Pro-69 with C-H bonds; Gln-71, and Glu-113 with hydrogen bonds; while both Glu-114 and Glu-113 with salt-bridge connection. Permissible 1.0-1.5 Å range of RMSD fluctuation between 20 and 45 ns was obtained in simulation of IRS1 and IRS1-met complex confirmed that both complexes were stable during whole simulation process. RMSF result showed that except positions 57AA and 114AA, the binding of drug had no severe effects on the flexibility of the IRS1 and IRS1-met complex. The RoG value of compactness and rigidity showed little change in IRS1 protein. SASA value of IRS1 indicated non-significant fluctuation between IRS1 and drug MH means ligand (drug) and IRS1 receptor form stable structure. Hydrogen bond strength of IRS1 and IRS1-met was 81.2 and 76.4, respectively, which suggested stable interaction.
Collapse
Affiliation(s)
- Ritika Kumari Singh
- School of Biotechnology, Institute of Science, BHU, Varanasi, Uttar Pradesh 221005 India
| | | | - Arvind Kumar
- School of Biotechnology, Institute of Science, BHU, Varanasi, Uttar Pradesh 221005 India
| |
Collapse
|
26
|
Roche R, Moussad B, Shuvo MH, Tarafder S, Bhattacharya D. EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. Nucleic Acids Res 2024; 52:e27. [PMID: 38281252 PMCID: PMC10954458 DOI: 10.1093/nar/gkae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/22/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
27
|
Williams TJ, Allen MA, Ray AE, Benaud N, Chelliah DS, Albanese D, Donati C, Selbmann L, Coleine C, Ferrari BC. Novel endolithic bacteria of phylum Chloroflexota reveal a myriad of potential survival strategies in the Antarctic desert. Appl Environ Microbiol 2024; 90:e0226423. [PMID: 38372512 PMCID: PMC10952385 DOI: 10.1128/aem.02264-23] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 01/02/2024] [Indexed: 02/20/2024] Open
Abstract
The ice-free McMurdo Dry Valleys of Antarctica are dominated by nutrient-poor mineral soil and rocky outcrops. The principal habitat for microorganisms is within rocks (endolithic). In this environment, microorganisms are provided with protection against sub-zero temperatures, rapid thermal fluctuations, extreme dryness, and ultraviolet and solar radiation. Endolithic communities include lichen, algae, fungi, and a diverse array of bacteria. Chloroflexota is among the most abundant bacterial phyla present in these communities. Among the Chloroflexota are four novel classes of bacteria, here named Candidatus Spiritibacteria class. nov. (=UBA5177), Candidatus Martimicrobia class. nov. (=UBA4733), Candidatus Tarhunnaeia class. nov. (=UBA6077), and Candidatus Uliximicrobia class. nov. (=UBA2235). We retrieved 17 high-quality metagenome-assembled genomes (MAGs) that represent these four classes. Based on genome predictions, all these bacteria are inferred to be aerobic heterotrophs that encode enzymes for the catabolism of diverse sugars. These and other organic substrates are likely derived from lichen, algae, and fungi, as metabolites (including photosynthate), cell wall components, and extracellular matrix components. The majority of MAGs encode the capacity for trace gas oxidation using high-affinity uptake hydrogenases, which could provide energy and metabolic water required for survival and persistence. Furthermore, some MAGs encode the capacity to couple the energy generated from H2 and CO oxidation to support carbon fixation (atmospheric chemosynthesis). All encode mechanisms for the detoxification and efflux of heavy metals. Certain MAGs encode features that indicate possible interactions with other organisms, such as Tc-type toxin complexes, hemolysins, and macroglobulins.IMPORTANCEThe ice-free McMurdo Dry Valleys of Antarctica are the coldest and most hyperarid desert on Earth. It is, therefore, the closest analog to the surface of the planet Mars. Bacteria and other microorganisms survive by inhabiting airspaces within rocks (endolithic). We identify four novel classes of phylum Chloroflexota, and, based on interrogation of 17 metagenome-assembled genomes, we predict specific metabolic and physiological adaptations that facilitate the survival of these bacteria in this harsh environment-including oxidation of trace gases and the utilization of nutrients (including sugars) derived from lichen, algae, and fungi. We propose that such adaptations allow these endolithic bacteria to eke out an existence in this cold and extremely dry habitat.
Collapse
Affiliation(s)
- Timothy J Williams
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, Australia
| | - Michelle A Allen
- School of Biological, Earth and Environmental Sciences, UNSW Sydney, Sydney, New South Wales, Australia
| | - Angelique E Ray
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, Australia
| | - Nicole Benaud
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, Australia
| | - Devan S Chelliah
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, Australia
| | - Davide Albanese
- Research and Innovation Center, Fondazione Edmund Mach, San Michele all'Adige, Italy
| | - Claudio Donati
- Research and Innovation Center, Fondazione Edmund Mach, San Michele all'Adige, Italy
| | - Laura Selbmann
- Department of Ecological and Biological Sciences, University of Tuscia, Largo dell'Università, Viterbo, Italy
- Mycological Section, Italian Antarctic National Museum (MNA), Genova, Italy
| | - Claudia Coleine
- Department of Ecological and Biological Sciences, University of Tuscia, Largo dell'Università, Viterbo, Italy
| | - Belinda C Ferrari
- School of Biotechnology and Biomolecular Sciences, UNSW Sydney, Sydney, New South Wales, Australia
| |
Collapse
|
28
|
Zhang Y, Li S, Meng K, Sun S. Machine Learning for Sequence and Structure-Based Protein-Ligand Interaction Prediction. J Chem Inf Model 2024; 64:1456-1472. [PMID: 38385768 DOI: 10.1021/acs.jcim.3c01841] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/23/2024]
Abstract
Developing new drugs is too expensive and time -consuming. Accurately predicting the interaction between drugs and targets will likely change how the drug is discovered. Machine learning-based protein-ligand interaction prediction has demonstrated significant potential. In this paper, computational methods, focusing on sequence and structure to study protein-ligand interactions, are examined. Therefore, this paper starts by presenting an overview of the data sets applied in this area, as well as the various approaches applied for representing proteins and ligands. Then, sequence-based and structure-based classification criteria are subsequently utilized to categorize and summarize both the classical machine learning models and deep learning models employed in protein-ligand interaction studies. Moreover, the evaluation methods and interpretability of these models are proposed. Furthermore, delving into the diverse applications of protein-ligand interaction models in drug research is presented. Lastly, the current challenges and future directions in this field are addressed.
Collapse
Affiliation(s)
- Yunjiang Zhang
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shuyuan Li
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Kong Meng
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| | - Shaorui Sun
- Beijing Key Laboratory for Green Catalysis and Separation, The Faculty of Environment and Life, Beijing University of Technology, Beijing 100124, P. R. China
| |
Collapse
|
29
|
Yin S, Mi X, Shukla D. Leveraging Machine Learning Models for Peptide-Protein Interaction Prediction. ARXIV 2024:arXiv:2310.18249v2. [PMID: 37961736 PMCID: PMC10635286] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 11/15/2023]
Abstract
Peptides play a pivotal role in a wide range of biological activities through participating in up to 40% protein-protein interactions in cellular processes. They also demonstrate remarkable specificity and efficacy, making them promising candidates for drug development. However, predicting peptide-protein complexes by traditional computational approaches, such as Docking and Molecular Dynamics simulations, still remains a challenge due to high computational cost, flexible nature of peptides, and limited structural information of peptide-protein complexes. In recent years, the surge of available biological data has given rise to the development of an increasing number of machine learning models for predicting peptide-protein interactions. These models offer efficient solutions to address the challenges associated with traditional computational approaches. Furthermore, they offer enhanced accuracy, robustness, and interpretability in their predictive outcomes. This review presents a comprehensive overview of machine learning and deep learning models that have emerged in recent years for the prediction of peptide-protein interactions.
Collapse
Affiliation(s)
- Song Yin
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Xuenan Mi
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- These authors contributed to the work equally
| | - Diwakar Shukla
- Department of Chemical and Biomolecular Engineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Center for Biophysics and Quantitative Biology, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
- Department of Bioengineering, University of Illinois Urbana-Champaign, Urbana, IL 61801, United States
| |
Collapse
|
30
|
Birchfield AS, McIntosh CA. Expression and Purification of Cp3GT: Structural Analysis and Modeling of a Key Plant Flavonol-3-O Glucosyltransferase from Citrus paradisi. BIOTECH 2024; 13:4. [PMID: 38390907 PMCID: PMC10885057 DOI: 10.3390/biotech13010004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2023] [Revised: 12/15/2023] [Accepted: 01/09/2024] [Indexed: 02/24/2024] Open
Abstract
Glycosyltransferases (GTs) are pivotal enzymes in the biosynthesis of various biological molecules. This study focuses on the scale-up, expression, and purification of a plant flavonol-specific 3-O glucosyltransferase (Cp3GT), a key enzyme from Citrus paradisi, for structural analysis and modeling. The challenges associated with recombinant protein production in Pichia pastoris, such as proteolytic degradation, were addressed through the optimization of culture conditions and purification processes. The purification strategy employed affinity, anion exchange, and size exclusion chromatography, leading to greater than 95% homogeneity for Cp3GT. In silico modeling, using D-I-TASSER and COFACTOR integrated with the AlphaFold2 pipeline, provided insights into the structural dynamics of Cp3GT and its ligand binding sites, offering predictions for enzyme-substrate interactions. These models were compared to experimentally derived structures, enhancing understanding of the enzyme's functional mechanisms. The findings present a comprehensive approach to produce a highly purified Cp3GT which is suitable for crystallographic studies and to shed light on the structural basis of flavonol specificity in plant GTs. The significant implications of these results for synthetic biology and enzyme engineering in pharmaceutical applications are also considered.
Collapse
Affiliation(s)
- Aaron S Birchfield
- Department of Biological Sciences, East Tennessee State University, P.O. Box 70703, Johnson City, TN 37614, USA
| | - Cecilia A McIntosh
- Department of Biological Sciences, East Tennessee State University, P.O. Box 70703, Johnson City, TN 37614, USA
| |
Collapse
|
31
|
Lu JB, Ren PP, Li Q, He F, Xu ZT, Wang SN, Chen JP, Li JM, Zhang CX. The evolution and functional divergence of 10 Apolipoprotein D-like genes in Nilaparvata lugens. INSECT SCIENCE 2024; 31:91-105. [PMID: 37334667 DOI: 10.1111/1744-7917.13216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/10/2022] [Revised: 04/16/2023] [Accepted: 04/22/2023] [Indexed: 06/20/2023]
Abstract
Apolipoprotein D (ApoD), a member of the lipocalin superfamily of proteins, is involved in lipid transport and stress resistance. Whereas only a single copy of the ApoD gene is found in humans and some other vertebrates, there are typically several ApoD-like genes in insects. To date, there have been relatively few studies that have examined the evolution and functional differentiation of ApoD-like genes in insects, particularly hemi-metabolous insects. In this study, we identified 10 ApoD-like genes (NlApoD1-10) with distinct spatiotemporal expression patterns in Nilaparvata lugens (BPH), which is an important pest of rice. NlApoD1-10 were found to be distributed on 3 chromosomes in a tandem array of NlApoD1/2, NlApoD3-5, and NlApoD7/8, and show sequence and gene structural divergence in the coding regions, indicating that multiple gene duplication events occurred during evolution. Phylogenetic analysis revealed that NlApoD1-10 can be clustered into 5 clades, with NlApoD3-5 and NlApoD7/8 potentially evolving exclusively in the Delphacidae family. Functional screening using an RNA interference approach revealed that only NlApoD2 was essential for BPH development and survival, whereas NlApoD4/5 are highly expressed in testes, and might play roles in reproduction. Moreover, stress response analysis revealed that NlApoD3-5/9, NlApoD3-5, and NlApoD9 were up-regulated after treatment with lipopolysaccharide, H2 O2 , and ultraviolet-C, respectively, indicating their potential roles in stress resistance.
Collapse
Affiliation(s)
- Jia-Bao Lu
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Peng-Peng Ren
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Qiao Li
- Technology Center of Wuhan Customs District, Hubei, China
- Institute of Insect Science, Zhejiang University, Hangzhou, China
| | - Fang He
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Zhong-Tian Xu
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Sai-Nan Wang
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Jian-Ping Chen
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Jun-Min Li
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
| | - Chuan-Xi Zhang
- State Key Laboratory for Managing Biotic and Chemical Threats to the Quality and Safety of Agro-products, Key Laboratory of Biotechnology in Plant Protection of MARA and Zhejiang Province, Institute of Plant Virology, Ningbo University, Ningbo, Zhejiang Province, China
- Institute of Insect Science, Zhejiang University, Hangzhou, China
| |
Collapse
|
32
|
Zhang J, Basu S, Kurgan L. HybridDBRpred: improved sequence-based prediction of DNA-binding amino acids using annotations from structured complexes and disordered proteins. Nucleic Acids Res 2024; 52:e10. [PMID: 38048333 PMCID: PMC10810184 DOI: 10.1093/nar/gkad1131] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Accepted: 11/10/2023] [Indexed: 12/06/2023] Open
Abstract
Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.
Collapse
Affiliation(s)
- Jian Zhang
- School of Computer and Information Technology, Xinyang Normal University, Xinyang 464000, PR China
| | - Sushmita Basu
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA 23284, USA
| |
Collapse
|
33
|
Zhang C, Zhang X, Freddolino P, Zhang Y. BioLiP2: an updated structure database for biologically relevant ligand-protein interactions. Nucleic Acids Res 2024; 52:D404-D412. [PMID: 37522378 PMCID: PMC10767969 DOI: 10.1093/nar/gkad630] [Citation(s) in RCA: 9] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2023] [Revised: 07/03/2023] [Accepted: 07/17/2023] [Indexed: 08/01/2023] Open
Abstract
With the progress of structural biology, the Protein Data Bank (PDB) has witnessed rapid accumulation of experimentally solved protein structures. Since many structures are determined with purification and crystallization additives that are unrelated to a protein's in vivo function, it is nontrivial to identify the subset of protein-ligand interactions that are biologically relevant. We developed the BioLiP2 database (https://zhanggroup.org/BioLiP) to extract biologically relevant protein-ligand interactions from the PDB database. BioLiP2 assesses the functional relevance of the ligands by geometric rules and experimental literature validations. The ligand binding information is further enriched with other function annotations, including Enzyme Commission numbers, Gene Ontology terms, catalytic sites, and binding affinities collected from other databases and a manual literature survey. Compared to its predecessor BioLiP, BioLiP2 offers significantly greater coverage of nucleic acid-protein interactions, and interactions involving large complexes that are unavailable in PDB format. BioLiP2 also integrates cutting-edge structural alignment algorithms with state-of-the-art structure prediction techniques, which for the first time enables composite protein structure and sequence-based searching and significantly enhances the usefulness of the database in structure-based function annotations. With these new developments, BioLiP2 will continue to be an important and comprehensive database for docking, virtual screening, and structure-based protein function analyses.
Collapse
Affiliation(s)
- Chengxin Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Xi Zhang
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Peter L Freddolino
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Biological Chemistry, University of Michigan, Ann Arbor, MI 48109, USA
- Department of Computer Science, School of Computing, National University of Singapore, 117417, Singapore
- Cancer Science Institute of Singapore, National University of Singapore,117599, Singapore
- Department of Biochemistry, Yong Loo Lin School of Medicine, National University of Singapore, 117596, Singapore
| |
Collapse
|
34
|
Roy BG, Choi J, Fuchs MF. Predictive Modeling of Proteins Encoded by a Plant Virus Sheds a New Light on Their Structure and Inherent Multifunctionality. Biomolecules 2024; 14:62. [PMID: 38254661 PMCID: PMC10813169 DOI: 10.3390/biom14010062] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2023] [Revised: 12/29/2023] [Accepted: 12/30/2023] [Indexed: 01/24/2024] Open
Abstract
Plant virus genomes encode proteins that are involved in replication, encapsidation, cell-to-cell, and long-distance movement, avoidance of host detection, counter-defense, and transmission from host to host, among other functions. Even though the multifunctionality of plant viral proteins is well documented, contemporary functional repertoires of individual proteins are incomplete. However, these can be enhanced by modeling tools. Here, predictive modeling of proteins encoded by the two genomic RNAs, i.e., RNA1 and RNA2, of grapevine fanleaf virus (GFLV) and their satellite RNAs by a suite of protein prediction software confirmed not only previously validated functions (suppressor of RNA silencing [VSR], viral genome-linked protein [VPg], protease [Pro], symptom determinant [Sd], homing protein [HP], movement protein [MP], coat protein [CP], and transmission determinant [Td]) and previously identified putative functions (helicase [Hel] and RNA-dependent RNA polymerase [Pol]), but also predicted novel functions with varying levels of confidence. These include a T3/T7-like RNA polymerase domain for protein 1AVSR, a short-chain reductase for protein 1BHel/VSR, a parathyroid hormone family domain for protein 1EPol/Sd, overlapping domains of unknown function and an ABC transporter domain for protein 2BMP, and DNA topoisomerase domains, transcription factor FBXO25 domain, or DNA Pol subunit cdc27 domain for the satellite RNA protein. Structural predictions for proteins 2AHP/Sd, 2BMP, and 3A? had low confidence, while predictions for proteins 1AVSR, 1BHel*/VSR, 1CVPg, 1DPro, 1EPol*/Sd, and 2CCP/Td retained higher confidence in at least one prediction. This research provided new insights into the structure and functions of GFLV proteins and their satellite protein. Future work is needed to validate these findings.
Collapse
Affiliation(s)
- Brandon G. Roy
- Plant Pathology and Plant-Microbe Biology Section, School of Integrative Plant Science, Cornell University, 15 Castle Creek Drive, Geneva, NY 14456, USA; (J.C.); (M.F.F.)
| | | | | |
Collapse
|
35
|
Shenoy A, Kalakoti Y, Sundar D, Elofsson A. M-Ionic: prediction of metal-ion-binding sites from sequence using residue embeddings. Bioinformatics 2024; 40:btad782. [PMID: 38175787 PMCID: PMC10792727 DOI: 10.1093/bioinformatics/btad782] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 12/20/2023] [Indexed: 01/06/2024] Open
Abstract
MOTIVATION Understanding metal-protein interaction can provide structural and functional insights into cellular processes. As the number of protein sequences increases, developing fast yet precise computational approaches to predict and annotate metal-binding sites becomes imperative. Quick and resource-efficient pre-trained protein language model (pLM) embeddings have successfully predicted binding sites from protein sequences despite not using structural or evolutionary features (multiple sequence alignments). Using residue-level embeddings from the pLMs, we have developed a sequence-based method (M-Ionic) to identify metal-binding proteins and predict residues involved in metal binding. RESULTS On independent validation of recent proteins, M-Ionic reports an area under the curve (AUROC) of 0.83 (recall = 84.6%) in distinguishing metal binding from non-binding proteins compared to AUROC of 0.74 (recall = 61.8%) of the next best method. In addition to comparable performance to the state-of-the-art method for identifying metal-binding residues (Ca2+, Mg2+, Mn2+, Zn2+), M-Ionic provides binding probabilities for six additional ions (i.e. Cu2+, Po43-, So42-, Fe2+, Fe3+, Co2+). We show that the pLM embedding of a single residue contains sufficient information about its neighbours to predict its binding properties. AVAILABILITY AND IMPLEMENTATION M-Ionic can be used on your protein of interest using a Google Colab Notebook (https://bit.ly/40FrRbK). The GitHub repository (https://github.com/TeamSundar/m-ionic) contains all code and data.
Collapse
Affiliation(s)
- Aditi Shenoy
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna 17121, Sweden
| | - Yogesh Kalakoti
- Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi 110016, India
| | - Durai Sundar
- Department of Biochemical Engineering & Biotechnology, Indian Institute of Technology (IIT) Delhi, New Delhi 110016, India
- Yardi School of Artificial Intelligence, Indian Institute of Technology (IIT) Delhi, New Delhi 110016, India
| | - Arne Elofsson
- Science for Life Laboratory and Department of Biochemistry and Biophysics, Stockholm University, Solna 17121, Sweden
| |
Collapse
|
36
|
Peng Y, Xiang M, Fan T, Zhong X, Dai A, Feng J, Guan P, Gong J, Li J, Wang Y. A Novel COCH p.D544Vfs*3 Variant Associated with DFNA9 Sensorineural Hearing Loss Causes Pathological Multimeric Cochlin Formation. Life (Basel) 2023; 14:33. [PMID: 38255649 PMCID: PMC10817332 DOI: 10.3390/life14010033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2023] [Revised: 12/16/2023] [Accepted: 12/19/2023] [Indexed: 01/24/2024] Open
Abstract
COCH (coagulation factor C homology) is one of the most frequently mutated genes of autosomal dominant non-syndromic hearing loss. Variants in COCH could cause DFNA9, which is characterized by late-onset hearing loss with variable degrees of vestibular dysfunction. In this study, we report a Chinese family with a novel COCH variant (c.1687delA) causing p.D544Vfs*3 in the cochlin. Comprehensive audiometric tests and vestibular function assessments were taken to acquire the phenotypic profile of the subjects. Next-generation sequencing was conducted and segregation analysis was carried out using Sanger sequencing. The proband presented mild vestibular symptoms and normal functional assessment results in almost every test, while the variant co-segregated with hearing impairment in the pedigree. The variant was located beyond the vWFA2 domain, which was predicted to affect the post-translational cleavage of the cochlin via molecular modeling analysis. Notably, in the overexpressing study, by transient transfecting the HEK 293T cells, we found that the p.D544Vfs*3 variant increased the formation of multimeric cochlin. Our result enriched the spectrum of DFNA9-linked pathological COCH variants and suggested that variants, causative of cochlin multimerization, could be related to DFNA9 with sensorineural hearing loss rather than serious vestibular symptoms.
Collapse
Affiliation(s)
- Yingqiu Peng
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
- NHC Key Laboratory of Hearing Medicine, Fudan University, Shanghai 200031, China
| | - Mengya Xiang
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
- NHC Key Laboratory of Hearing Medicine, Fudan University, Shanghai 200031, China
| | - Ting Fan
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
- NHC Key Laboratory of Hearing Medicine, Fudan University, Shanghai 200031, China
| | - Xiaofang Zhong
- Clinical Laboratory Center, Children’s Hospital of Fudan University, Shanghai 201102, China
| | - Aqiang Dai
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
| | - Jialing Feng
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
| | - Pengfei Guan
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
- NHC Key Laboratory of Hearing Medicine, Fudan University, Shanghai 200031, China
| | - Jiamin Gong
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
| | - Jian Li
- Clinical Laboratory Center, Children’s Hospital of Fudan University, Shanghai 201102, China
| | - Yunfeng Wang
- ENT Institute and Department of Otorhinolaryngology, EYE & ENT Hospital, Fudan University, Shanghai 200031, China
- NHC Key Laboratory of Hearing Medicine, Fudan University, Shanghai 200031, China
| |
Collapse
|
37
|
Guth FM, Lindner F, Rydzek S, Peil A, Friedrich S, Hauer B, Hahn F. Rieske Oxygenase-Catalyzed Oxidative Late-Stage Functionalization during Complex Antifungal Polyketide Biosynthesis. ACS Chem Biol 2023; 18:2450-2456. [PMID: 37948749 DOI: 10.1021/acschembio.3c00498] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2023]
Abstract
Rieske oxygenases (ROs) from natural product biosynthetic pathways are a poorly studied group of enzymes with significant potential as oxidative functionalization biocatalysts. A study on the ROs JerL, JerP, and AmbP from the biosynthetic pathways of jerangolid A and ambruticin VS-3 is described. Their activity was successfully reconstituted using whole-cell bioconversion systems coexpressing the ROs and their respective natural flavin-dependent reductase (FDR) partners. Feeding authentic biosynthetic intermediates and synthetic surrogates to these strains confirmed the involvement of the ROs in hydroxymethylpyrone and dihydropyran formation and revealed crucial information about the RO's substrate specificity. The pronounced dependence of JerL and JerP on the presence of a methylenolether allowed the precise temporal assignment of RO catalysis to the ultimate steps of jerangolid biosynthesis. JerP and AmbP stand out among the biosynthetic ROs studied so far for their ability to catalyze clean tetrahydropyran desaturation without further functionalizing the formed electron-rich double bonds. This work highlights the remarkable ability of ROs to highly selectively oxidize complex molecular scaffolds.
Collapse
Affiliation(s)
- Florian M Guth
- Professur für Organische Chemie (Lebensmittelchemie), Fakultät für Biologie, Chemie und Geowissenschaften, Department of Chemistry, Universität Bayreuth, 95447 Bayreuth, Germany
| | - Frederick Lindner
- Professur für Organische Chemie (Lebensmittelchemie), Fakultät für Biologie, Chemie und Geowissenschaften, Department of Chemistry, Universität Bayreuth, 95447 Bayreuth, Germany
| | - Simon Rydzek
- Professur für Organische Chemie (Lebensmittelchemie), Fakultät für Biologie, Chemie und Geowissenschaften, Department of Chemistry, Universität Bayreuth, 95447 Bayreuth, Germany
| | - Andreas Peil
- Professur für Organische Chemie (Lebensmittelchemie), Fakultät für Biologie, Chemie und Geowissenschaften, Department of Chemistry, Universität Bayreuth, 95447 Bayreuth, Germany
| | - Steffen Friedrich
- Professur für Organische Chemie (Lebensmittelchemie), Fakultät für Biologie, Chemie und Geowissenschaften, Department of Chemistry, Universität Bayreuth, 95447 Bayreuth, Germany
| | - Bernhard Hauer
- Institute of Technical Biochemistry, Universität Stuttgart, Allmandring 31, 70569 Stuttgart, Germany
| | - Frank Hahn
- Professur für Organische Chemie (Lebensmittelchemie), Fakultät für Biologie, Chemie und Geowissenschaften, Department of Chemistry, Universität Bayreuth, 95447 Bayreuth, Germany
| |
Collapse
|
38
|
Cong H, Liu H, Cao Y, Liang C, Chen Y. Protein-protein interaction site prediction by model ensembling with hybrid feature and self-attention. BMC Bioinformatics 2023; 24:456. [PMID: 38053020 DOI: 10.1186/s12859-023-05592-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2022] [Accepted: 11/30/2023] [Indexed: 12/07/2023] Open
Abstract
BACKGROUND Protein-protein interactions (PPIs) are crucial in various biological functions and cellular processes. Thus, many computational approaches have been proposed to predict PPI sites. Although significant progress has been made, these methods still have limitations in encoding the characteristics of each amino acid in sequences. Many feature extraction methods rely on the sliding window technique, which simply merges all the features of residues into a vector. The importance of some key residues may be weakened in the feature vector, leading to poor performance. RESULTS We propose a novel sequence-based method for PPI sites prediction. The new network model, PPINet, contains multiple feature processing paths. For a residue, the PPINet extracts the features of the targeted residue and its context separately. These two types of features are processed by two paths in the network and combined to form a protein representation, where the two types of features are of relatively equal importance. The model ensembling technique is applied to make use of more features. The base models are trained with different features and then ensembled via stacking. In addition, a data balancing strategy is presented, by which our model can get significant improvement on highly unbalanced data. CONCLUSION The proposed method is evaluated on a fused dataset constructed from Dset186, Dset_72, and PDBset_164, as well as the public Dset_448 dataset. Compared with current state-of-the-art methods, the performance of our method is better than the others. In the most important metrics, such as AUPRC and recall, it surpasses the second-best programmer on the latter dataset by 6.9% and 4.7%, respectively. We also demonstrated that the improvement is essentially due to using the ensemble model, especially, the hybrid feature. We share our code for reproducibility and future research at https://github.com/CandiceCong/StackingPPINet .
Collapse
Affiliation(s)
- Hanhan Cong
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China
| | - Hong Liu
- School of Information Science and Engineering, Shandong Normal University, Jinan, China.
- Shandong Provincial Key Laboratory for Novel Distributed Computer Software Technology, Jinan, China.
| | - Yi Cao
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| | - Cheng Liang
- School of Information Science and Engineering, Shandong Normal University, Jinan, China
| | - Yuehui Chen
- School of Information Science and Engineering, University of Jinan, Jinan, China
- Shandong Provincial Key Laboratory of Network Based Intelligent Computing, Jinan, China
| |
Collapse
|
39
|
Kumar A, Hooda P, Puri A, Khatter R, S. Al-Dosari M, Sinha N, Parvez MK, Sehgal D. Methotrexate, an anti-inflammatory drug, inhibits Hepatitis E viral replication. J Enzyme Inhib Med Chem 2023; 38:2280500. [PMID: 37975328 PMCID: PMC11003484 DOI: 10.1080/14756366.2023.2280500] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2023] [Accepted: 10/30/2023] [Indexed: 11/19/2023] Open
Abstract
Hepatitis E Virus (HEV) is a positively oriented RNA virus having a 7.2 kb genome. HEV consists of three open reading frames (ORF1-3). Of these, ORF1 codes for the enzymes Methyltransferase (Mtase), Papain-like cysteine protease (PCP), RNA helicase, and RNA-dependent RNA polymerase (RdRp). Unavailability of a vaccine or effective drug against HEV and considering the side effects associated with the off-label use of ribavirin (RBV) and pegylated interferons, an alternative approach is required by the modulation of specific enzymes to prevent the infection. HEV helicase is involved in unwinding the double-stranded RNA, RNA processing, transcriptional regulation, and pre-mRNA processing. Therefore, we screened FDA-approved compounds from the ZINC15 database against the modelled 3D structure of HEV helicase and found that methotrexate and compound A (Pubchem ID BTB07890) inhibit the NTPase and dsRNA unwinding activity leading to inhibition of HEV RNA replication. This may be further authenticated by in vivo study.
Collapse
Affiliation(s)
- Akash Kumar
- Department of Life Sciences, Virology lab, Shiv Nadar Institution of Eminence, Greater Noida, India
| | - Preeti Hooda
- Department of Life Sciences, Virology lab, Shiv Nadar Institution of Eminence, Greater Noida, India
| | - Anindita Puri
- Department of Life Sciences, Virology lab, Shiv Nadar Institution of Eminence, Greater Noida, India
| | - Radhika Khatter
- Department of Life Sciences, Virology lab, Shiv Nadar Institution of Eminence, Greater Noida, India
| | - Mohammed S. Al-Dosari
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Neha Sinha
- Department of Infectious Diseases and Microbiology, School of Public Health, University of Pittsburgh, Pittsburgh, PA, USA
| | - Mohammad K. Parvez
- Department of Pharmacognosy, College of Pharmacy, King Saud University, Riyadh, Saudi Arabia
| | - Deepak Sehgal
- Department of Life Sciences, Virology lab, Shiv Nadar Institution of Eminence, Greater Noida, India
| |
Collapse
|
40
|
Fang Y, Jiang Y, Wei L, Ma Q, Ren Z, Yuan Q, Wei DQ. DeepProSite: structure-aware protein binding site prediction using ESMFold and pretrained language model. Bioinformatics 2023; 39:btad718. [PMID: 38015872 PMCID: PMC10723037 DOI: 10.1093/bioinformatics/btad718] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2023] [Revised: 11/04/2023] [Accepted: 11/27/2023] [Indexed: 11/30/2023] Open
Abstract
MOTIVATION Identifying the functional sites of a protein, such as the binding sites of proteins, peptides, or other biological components, is crucial for understanding related biological processes and drug design. However, existing sequence-based methods have limited predictive accuracy, as they only consider sequence-adjacent contextual features and lack structural information. RESULTS In this study, DeepProSite is presented as a new framework for identifying protein binding site that utilizes protein structure and sequence information. DeepProSite first generates protein structures from ESMFold and sequence representations from pretrained language models. It then uses Graph Transformer and formulates binding site predictions as graph node classifications. In predicting protein-protein/peptide binding sites, DeepProSite outperforms state-of-the-art sequence- and structure-based methods on most metrics. Moreover, DeepProSite maintains its performance when predicting unbound structures, in contrast to competing structure-based prediction methods. DeepProSite is also extended to the prediction of binding sites for nucleic acids and other ligands, verifying its generalization capability. Finally, an online server for predicting multiple types of residue is established as the implementation of the proposed DeepProSite. AVAILABILITY AND IMPLEMENTATION The datasets and source codes can be accessed at https://github.com/WeiLab-Biology/DeepProSite. The proposed DeepProSite can be accessed at https://inner.wei-group.net/DeepProSite/.
Collapse
Affiliation(s)
- Yitian Fang
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| | - Yi Jiang
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | - Leyi Wei
- School of Software, Shandong University, Jinan, Shandong 250100, China
| | - Qin Ma
- Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA
| | | | - Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, Shanghai-Islamabad-Belgrade Joint Innovation Center on Antibacterial Resistances, Joint International Research Laboratory of Metabolic & Developmental Sciences and School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200040, China
- Peng Cheng Laboratory, Shenzhen 518055, China
| |
Collapse
|
41
|
Leemann M, Sagasta A, Eberhardt J, Schwede T, Robin X, Durairaj J. Automated benchmarking of combined protein structure and ligand conformation prediction. Proteins 2023; 91:1912-1924. [PMID: 37885318 DOI: 10.1002/prot.26605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2023] [Revised: 09/15/2023] [Accepted: 09/21/2023] [Indexed: 10/28/2023]
Abstract
The prediction of protein-ligand complexes (PLC), using both experimental and predicted structures, is an active and important area of research, underscored by the inclusion of the Protein-Ligand Interaction category in the latest round of the Critical Assessment of Protein Structure Prediction experiment CASP15. The prediction task in CASP15 consisted of predicting both the three-dimensional structure of the receptor protein as well as the position and conformation of the ligand. This paper addresses the challenges and proposed solutions for devising automated benchmarking techniques for PLC prediction. The reliability of experimentally solved PLC as ground truth reference structures is assessed using various validation criteria. Similarity of PLC to previously released complexes are employed to judge PLC diversity and the difficulty of a PLC as a prediction target. We show that the commonly used PDBBind time-split test-set is inappropriate for comprehensive PLC evaluation, with state-of-the-art tools showing conflicting results on a more representative and high quality dataset constructed for benchmarking purposes. We also show that redocking on crystal structures is a much simpler task than docking into predicted protein models, demonstrated by the two PLC-prediction-specific scoring metrics created. Finally, we introduce a fully automated pipeline that predicts PLC and evaluates the accuracy of the protein structure, ligand pose, and protein-ligand interactions.
Collapse
Affiliation(s)
- Michèle Leemann
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Ander Sagasta
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Jerome Eberhardt
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Torsten Schwede
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Xavier Robin
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| | - Janani Durairaj
- Biozentrum, University of Basel, Basel, Switzerland
- SIB Swiss Institute of Bioinformatics, Basel, Switzerland
| |
Collapse
|
42
|
Ochoa R, Fox T. Assessing the fast prediction of peptide conformers and the impact of non-natural modifications. J Mol Graph Model 2023; 125:108608. [PMID: 37659134 DOI: 10.1016/j.jmgm.2023.108608] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2023] [Revised: 08/17/2023] [Accepted: 08/18/2023] [Indexed: 09/04/2023]
Abstract
We present an assessment of different approaches to predict peptide structures using modeling tools. Several small molecule, protein, and peptide-focused methodologies were used for the fast prediction of conformers for peptides shorter than 30 amino acids. We assessed the effect of including restraints based on annotated or predicted secondary structure motifs. A number of peptides in bound conformations and in solution were collected to compare the tools. In addition, we studied the impact of changing single amino acids to non-natural residues using molecular dynamics simulations. Deep learning methods such as AlphaFold2, or the combination of physics-based approaches with secondary structure information, produce the most accurate results for natural sequences. In the case of peptides with non-natural modifications, modeling the peptide containing natural amino acids first and then modifying and simulating the peptide using benchmarked force fields is a recommended pipeline. The results can guide the modeling of oligopeptides for drug discovery projects.
Collapse
Affiliation(s)
- Rodrigo Ochoa
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany.
| | - Thomas Fox
- Medicinal Chemistry, Boehringer Ingelheim Pharma GmbH & Co KG, 88397 Biberach/Riss, Germany
| |
Collapse
|
43
|
Ribeiro AJM, Riziotis IG, Borkakoti N, Thornton JM. Enzyme function and evolution through the lens of bioinformatics. Biochem J 2023; 480:1845-1863. [PMID: 37991346 PMCID: PMC10754289 DOI: 10.1042/bcj20220405] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2023] [Revised: 11/09/2023] [Accepted: 11/14/2023] [Indexed: 11/23/2023]
Abstract
Enzymes have been shaped by evolution over billions of years to catalyse the chemical reactions that support life on earth. Dispersed in the literature, or organised in online databases, knowledge about enzymes can be structured in distinct dimensions, either related to their quality as biological macromolecules, such as their sequence and structure, or related to their chemical functions, such as the catalytic site, kinetics, mechanism, and overall reaction. The evolution of enzymes can only be understood when each of these dimensions is considered. In addition, many of the properties of enzymes only make sense in the light of evolution. We start this review by outlining the main paradigms of enzyme evolution, including gene duplication and divergence, convergent evolution, and evolution by recombination of domains. In the second part, we overview the current collective knowledge about enzymes, as organised by different types of data and collected in several databases. We also highlight some increasingly powerful computational tools that can be used to close gaps in understanding, in particular for types of data that require laborious experimental protocols. We believe that recent advances in protein structure prediction will be a powerful catalyst for the prediction of binding, mechanism, and ultimately, chemical reactions. A comprehensive mapping of enzyme function and evolution may be attainable in the near future.
Collapse
Affiliation(s)
- Antonio J. M. Ribeiro
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Ioannis G. Riziotis
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Neera Borkakoti
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| | - Janet M. Thornton
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, U.K
| |
Collapse
|
44
|
Wang Y, Xia Y, Yan J, Yuan Y, Shen HB, Pan X. ZeroBind: a protein-specific zero-shot predictor with subgraph matching for drug-target interactions. Nat Commun 2023; 14:7861. [PMID: 38030641 PMCID: PMC10687269 DOI: 10.1038/s41467-023-43597-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2023] [Accepted: 11/13/2023] [Indexed: 12/01/2023] Open
Abstract
Existing drug-target interaction (DTI) prediction methods generally fail to generalize well to novel (unseen) proteins and drugs. In this study, we propose a protein-specific meta-learning framework ZeroBind with subgraph matching for predicting protein-drug interactions from their structures. During the meta-training process, ZeroBind formulates training a protein-specific model, which is also considered a learning task, and each task uses graph neural networks (GNNs) to learn the protein graph embedding and the molecular graph embedding. Inspired by the fact that molecules bind to a binding pocket in proteins instead of the whole protein, ZeroBind introduces a weakly supervised subgraph information bottleneck (SIB) module to recognize the maximally informative and compressive subgraphs in protein graphs as potential binding pockets. In addition, ZeroBind trains the models of individual proteins as multiple tasks, whose importance is automatically learned with a task adaptive self-attention module to make final predictions. The results show that ZeroBind achieves superior performance on DTI prediction over existing methods, especially for those unseen proteins and drugs, and performs well after fine-tuning for those proteins or drugs with a few known binding partners.
Collapse
Affiliation(s)
- Yuxuan Wang
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Ying Xia
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Junchi Yan
- Department of Computer Science and Engineering, and MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Ye Yuan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Hong-Bin Shen
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China
| | - Xiaoyong Pan
- Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, 200240, China.
| |
Collapse
|
45
|
Chandra A, Sharma A, Dehzangi I, Tsunoda T, Sattar A. PepCNN deep learning tool for predicting peptide binding residues in proteins using sequence, structural, and language model features. Sci Rep 2023; 13:20882. [PMID: 38016996 PMCID: PMC10684570 DOI: 10.1038/s41598-023-47624-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2023] [Accepted: 11/16/2023] [Indexed: 11/30/2023] Open
Abstract
Protein-peptide interactions play a crucial role in various cellular processes and are implicated in abnormal cellular behaviors leading to diseases such as cancer. Therefore, understanding these interactions is vital for both functional genomics and drug discovery efforts. Despite a significant increase in the availability of protein-peptide complexes, experimental methods for studying these interactions remain laborious, time-consuming, and expensive. Computational methods offer a complementary approach but often fall short in terms of prediction accuracy. To address these challenges, we introduce PepCNN, a deep learning-based prediction model that incorporates structural and sequence-based information from primary protein sequences. By utilizing a combination of half-sphere exposure, position specific scoring matrices from multiple-sequence alignment tool, and embedding from a pre-trained protein language model, PepCNN outperforms state-of-the-art methods in terms of specificity, precision, and AUC. The PepCNN software and datasets are publicly available at https://github.com/abelavit/PepCNN.git .
Collapse
Affiliation(s)
- Abel Chandra
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
| | - Alok Sharma
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia.
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan.
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan.
| | - Iman Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, USA
- Center for Computational and Integrative Biology, Rutgers University, Camden, USA
| | - Tatsuhiko Tsunoda
- Laboratory for Medical Science Mathematics, Department of Biological Sciences, School of Science, The University of Tokyo, Tokyo, Japan
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, Japan
- Laboratory for Medical Science Mathematics, Department of Computational Biology and Medical Sciences, Graduate School of Frontier Sciences, The University of Tokyo, Tokyo, Japan
| | - Abdul Sattar
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, Australia
| |
Collapse
|
46
|
Pang M, He W, Lu X, She Y, Xie L, Kong R, Chang S. CoDock-Ligand: combined template-based docking and CNN-based scoring in ligand binding prediction. BMC Bioinformatics 2023; 24:444. [PMID: 37996806 PMCID: PMC10668353 DOI: 10.1186/s12859-023-05571-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 11/16/2023] [Indexed: 11/25/2023] Open
Abstract
For ligand binding prediction, it is crucial for molecular docking programs to integrate template-based modeling with a precise scoring function. Here, we proposed the CoDock-Ligand docking method that combines template-based modeling and the GNINA scoring function, a Convolutional Neural Network-based scoring function, for the ligand binding prediction in CASP15. Among the 21 targets, we obtained successful predictions in top 5 submissions for 14 targets and partially successful predictions for 4 targets. In particular, for the most complicated target, H1114, which contains 56 metal cofactors and small molecules, our docking method successfully predicted the binding of most ligands. Analysis of the failed systems showed that the predicted receptor protein presented conformational changes in the backbone and side chains of the binding site residues, which may cause large structural deviations in the ligand binding prediction. In summary, our hybrid docking scheme was efficiently adapted to the ligand binding prediction challenges in CASP15.
Collapse
Affiliation(s)
- Mingwei Pang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, Jiangsu, China
| | - Wangqiu He
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, Jiangsu, China
| | - Xufeng Lu
- Primary Biotechnology Inc., Changzhou, 213125, Jiangsu, China
| | - Yuting She
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, Jiangsu, China
| | - Liangxu Xie
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, Jiangsu, China
| | - Ren Kong
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, Jiangsu, China.
| | - Shan Chang
- Institute of Bioinformatics and Medical Engineering, School of Electrical and Information Engineering, Jiangsu University of Technology, Changzhou, 213001, Jiangsu, China.
| |
Collapse
|
47
|
Liu Y, Tian B. Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning. Brief Bioinform 2023; 25:bbad488. [PMID: 38171929 PMCID: PMC10782905 DOI: 10.1093/bib/bbad488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2023] [Revised: 09/28/2023] [Accepted: 11/30/2023] [Indexed: 01/05/2024] Open
Abstract
Protein-DNA interaction is critical for life activities such as replication, transcription and splicing. Identifying protein-DNA binding residues is essential for modeling their interaction and downstream studies. However, developing accurate and efficient computational methods for this task remains challenging. Improvements in this area have the potential to drive novel applications in biotechnology and drug design. In this study, we propose a novel approach called Contrastive Learning And Pre-trained Encoder (CLAPE), which combines a pre-trained protein language model and the contrastive learning method to predict DNA binding residues. We trained the CLAPE-DB model on the protein-DNA binding sites dataset and evaluated the model performance and generalization ability through various experiments. The results showed that the area under ROC curve values of the CLAPE-DB model on the two benchmark datasets reached 0.871 and 0.881, respectively, indicating superior performance compared to other existing models. CLAPE-DB showed better generalization ability and was specific to DNA-binding sites. In addition, we trained CLAPE on different protein-ligand binding sites datasets, demonstrating that CLAPE is a general framework for binding sites prediction. To facilitate the scientific community, the benchmark datasets and codes are freely available at https://github.com/YAndrewL/clape.
Collapse
Affiliation(s)
- Yufan Liu
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| | - Boxue Tian
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Molecular Oncology, School of Pharmaceutical Sciences, Tsinghua University, Beijing, 100084, China
| |
Collapse
|
48
|
Bag SS, Sinha S, Dutta S, Baishya HJ, Paul S. Targeting the SARS-CoV-2 RNA-dependent RNA polymerase (RdRp) with synthetic/designer unnatural nucleoside analogs: an in silico study. J Mol Model 2023; 29:366. [PMID: 37950101 DOI: 10.1007/s00894-023-05767-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2023] [Accepted: 10/25/2023] [Indexed: 11/12/2023]
Abstract
CONTEXT Since the outbreak of COVID-19 in December 2019, it developed into a pandemic affecting all the countries and millions of people around the globe. Until now, there is no medicine available to contain the spread of the virus. As an aid to drug discovery, the molecular docking and molecular dynamic tools were applied extensively. In silico studies made it possible for rapid screening of potential molecules as possible inhibitors/drugs against the targeted proteins. As a continuation of our drug discovery research, we have carried out molecular docking studies of our 12 reported unnatural nucleosides and 14 designer Avigan analogs with SARS-CoV-2, RNA-dependent RNA polymerase (RdRp), which we want to report herein. The same calculation was also carried out, taking 11 known/under trail/commercial nucleoside drug molecules for a comparison of the binding interactions in the catalytic site of RdRp. The docking results and binding efficiencies of our reported nucleosides and designer nucleosidic were compared with the binding energy of commercially available drugs such as remdesevir and favipiravir. Furthermore, we evaluated the protein-drug binding efficiency and stability of the best docked molecules by molecular dynamic studies (MD). From our study, we have found that few of our proposed drugs show promising binding efficiency at the catalytic pocket of SARS-CoV-2 RdRp and can be a promising RdRp inhibitor drug candidate. Hence, this study will be of importance to make progress toward developing successful nucleoside-based drugs and conduct the antiviral test in the wet lab to understand their efficacy against COVID-19. METHOD All the docking studies were carried out with AutoDock 4.2, AutoDock Vina and Molegro Virtual Docker. Following the docking studies, the MD simulations were carried out following the standard protocol with the GROMACS ver. 2019.6. by applying the CHARMM36 all-atom biomolecular force field. The drug-protein interaction was studied using the Biovia Discovery Studio suite, Ligplot software, and Protein-Ligand Interaction Profiler (PLIP).
Collapse
Affiliation(s)
- Subhendu Sekhar Bag
- Chemical Biology/Genomics Laboratory, Department of Chemistry, Indian Institute of Technology Guwahati, Guwahati, Assam, India, 781039.
- Centre for the Environment, Indian Institute of Technology Guwahati, Guwahati, Assam, India, 781039.
| | - Sayantan Sinha
- Centre for the Environment, Indian Institute of Technology Guwahati, Guwahati, Assam, India, 781039
| | - Soumya Dutta
- Chemical Biology/Genomics Laboratory, Department of Chemistry, Indian Institute of Technology Guwahati, Guwahati, Assam, India, 781039
| | - Hirak Jyoti Baishya
- Chemical Biology/Genomics Laboratory, Department of Chemistry, Indian Institute of Technology Guwahati, Guwahati, Assam, India, 781039
| | - Suravi Paul
- Chemical Biology/Genomics Laboratory, Department of Chemistry, Indian Institute of Technology Guwahati, Guwahati, Assam, India, 781039
| |
Collapse
|
49
|
Zhao Z, Bourne PE. How Ligands Interact with the Kinase Hinge. ACS Med Chem Lett 2023; 14:1503-1508. [PMID: 37974950 PMCID: PMC10641887 DOI: 10.1021/acsmedchemlett.3c00212] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2023] [Accepted: 10/03/2023] [Indexed: 11/19/2023] Open
Abstract
ATP-competitive kinase inhibitors form hydrogen bond interactions with the kinase hinge region at the adenine binding site. Thus, it is crucial to explore hinge-ligand recognition as part of a rational drug design strategy. Here, harnessing known ligand-bound kinase structures and experimental assay resources, we first created a kinase structure-assay database (KSAD) containing 2705 nM ligand-bound kinase complexes. Then, using KSAD, we systematically investigate hinge-ligand binding patterns using interaction fingerprints, thereby delineating 15 different hydrogen-bond interaction modes. We believe these results will be valuable for de novo drug design and/or scaffold hopping of kinase-targeted drugs.
Collapse
Affiliation(s)
- Zheng Zhao
- School of Data Science and Department
of Biomedical Engineering, University of
Virginia, Charlottesville, Virginia 22904, United States
| | - Philip E. Bourne
- School of Data Science and Department
of Biomedical Engineering, University of
Virginia, Charlottesville, Virginia 22904, United States
| |
Collapse
|
50
|
Rodrigues CHM, Ascher DB. CSM-Potential2: A comprehensive deep learning platform for the analysis of protein interacting interfaces. Proteins 2023. [PMID: 37870486 DOI: 10.1002/prot.26615] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2023] [Revised: 10/04/2023] [Accepted: 10/05/2023] [Indexed: 10/24/2023]
Abstract
Proteins are molecular machinery that participate in virtually all essential biological functions within the cell, which are tightly related to their 3D structure. The importance of understanding protein structure-function relationship is highlighted by the exponential growth of experimental structures, which has been greatly expanded by recent breakthroughs in protein structure prediction, most notably RosettaFold, and AlphaFold2. These advances have prompted the development of several computational approaches that leverage these data sources to explore potential biological interactions. However, most methods are generally limited to analysis of single types of interactions, such as protein-protein or protein-ligand interactions, and their complexity limits the usability to expert users. Here we report CSM-Potential2, a deep learning platform for the analysis of binding interfaces on protein structures. In addition to prediction of protein-protein interactions binding sites and classification of biological ligands, our new platform incorporates prediction of interactions with nucleic acids at the residue level and allows for ligand transplantation based on sequence and structure similarity to experimentally determined structures. We anticipate our platform to be a valuable resource that provides easy access to a range of state-of-the-art methods to expert and non-expert users for the study of biological interactions. Our tool is freely available as an easy-to-use web server and API available at https://biosig.lab.uq.edu.au/csm_potential.
Collapse
Affiliation(s)
- Carlos H M Rodrigues
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| | - David B Ascher
- Computational Biology and Clinical Informatics, Baker Heart and Diabetes Institute, Melbourne, Victoria, Australia
- School of Chemistry and Molecular Biosciences, University of Queensland, Brisbane, Queensland, Australia
| |
Collapse
|