1
|
Zhang L, Liu T. PDNAPred: Interpretable prediction of protein-DNA binding sites based on pre-trained protein language models. Int J Biol Macromol 2024:136147. [PMID: 39357703 DOI: 10.1016/j.ijbiomac.2024.136147] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Revised: 09/11/2024] [Accepted: 09/27/2024] [Indexed: 10/04/2024]
Abstract
Protein-DNA interactions play critical roles in various biological processes and are essential for drug discovery. However, traditional experimental methods are labor-intensive and unable to keep pace with the increasing volume of protein sequences, leading to a substantial number of proteins lacking DNA-binding annotations. Therefore, developing an efficient computational method to identify protein-DNA binding sites is crucial. Unfortunately, most existing computational methods rely on manually selected features or protein structure information, making these methods inapplicable to large-scale prediction tasks. In this study, we introduced PDNAPred, a sequence-based method that combines two pre-trained protein language models with a designed CNN-GRU network to identify DNA-binding sites. Additionally, to tackle the issue of imbalanced dataset samples, we employed focal loss. Our comprehensive experiments demonstrated that PDNAPred significantly improved the accuracy of DNA-binding site prediction, outperforming existing state-of-the-art sequence-based methods. Remarkably, PDNAPred also achieved results comparable to advanced structure-based methods. The designed CNN-GRU network enhances its capability to detect DNA-binding sites accurately. Furthermore, we validated the versatility of PDNAPred by training it on RNA-binding site datasets, showing its potential as a general framework for amino acid binding site prediction. Finally, we conducted model interpretability analysis to elucidate the reasons behind PDNAPred's outstanding performance.
Collapse
Affiliation(s)
- Lingrong Zhang
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China
| | - Taigang Liu
- College of Information Technology, Shanghai Ocean University, Shanghai 201306, China.
| |
Collapse
|
2
|
Zheng M, Sun G, Li X, Fan Y. EGPDI: identifying protein-DNA binding sites based on multi-view graph embedding fusion. Brief Bioinform 2024; 25:bbae330. [PMID: 38975896 PMCID: PMC11229037 DOI: 10.1093/bib/bbae330] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2024] [Revised: 06/08/2024] [Accepted: 06/26/2024] [Indexed: 07/09/2024] Open
Abstract
Mechanisms of protein-DNA interactions are involved in a wide range of biological activities and processes. Accurately identifying binding sites between proteins and DNA is crucial for analyzing genetic material, exploring protein functions, and designing novel drugs. In recent years, several computational methods have been proposed as alternatives to time-consuming and expensive traditional experiments. However, accurately predicting protein-DNA binding sites still remains a challenge. Existing computational methods often rely on handcrafted features and a single-model architecture, leaving room for improvement. We propose a novel computational method, called EGPDI, based on multi-view graph embedding fusion. This approach involves the integration of Equivariant Graph Neural Networks (EGNN) and Graph Convolutional Networks II (GCNII), independently configured to profoundly mine the global and local node embedding representations. An advanced gated multi-head attention mechanism is subsequently employed to capture the attention weights of the dual embedding representations, thereby facilitating the integration of node features. Besides, extra node features from protein language models are introduced to provide more structural information. To our knowledge, this is the first time that multi-view graph embedding fusion has been applied to the task of protein-DNA binding site prediction. The results of five-fold cross-validation and independent testing demonstrate that EGPDI outperforms state-of-the-art methods. Further comparative experiments and case studies also verify the superiority and generalization ability of EGPDI.
Collapse
Affiliation(s)
- Mengxin Zheng
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Guicong Sun
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Xueping Li
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| | - Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin 541004, China
| |
Collapse
|
3
|
Roche R, Moussad B, Shuvo MH, Tarafder S, Bhattacharya D. EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. Nucleic Acids Res 2024; 52:e27. [PMID: 38281252 PMCID: PMC10954458 DOI: 10.1093/nar/gkae039] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2023] [Revised: 12/22/2023] [Accepted: 01/11/2024] [Indexed: 01/30/2024] Open
Abstract
Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here, we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | - Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, USA
| | | |
Collapse
|
4
|
Roche R, Moussad B, Shuvo MH, Tarafder S, Bhattacharya D. EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.09.14.557719. [PMID: 37745556 PMCID: PMC10515942 DOI: 10.1101/2023.09.14.557719] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 09/26/2023]
Abstract
Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
Collapse
Affiliation(s)
- Rahmatullah Roche
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Bernard Moussad
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Md Hossain Shuvo
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Sumit Tarafder
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| | - Debswapna Bhattacharya
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States of America
| |
Collapse
|
5
|
Konc J, Janežič D. Protein binding sites for drug design. Biophys Rev 2022; 14:1413-1421. [PMID: 36532870 PMCID: PMC9734416 DOI: 10.1007/s12551-022-01028-3] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/12/2022] [Accepted: 12/01/2022] [Indexed: 12/13/2022] Open
Abstract
Drug development is a lengthy and challenging process that can be accelerated at early stages by new mathematical approaches and modern computers. To address this important issue, we are developing new mathematical solutions for the detection and characterization of protein binding sites that are important for new drug development. In this review, we present algorithms based on graph theory combined with molecular dynamics simulations that we have developed for studying biological target proteins to provide important data for optimizing the early stages of new drug development. A particular focus is the development of new protein binding site prediction algorithms (ProBiS) and new web tools for modeling pharmaceutically interesting molecules-ProBiS Tools (algorithm, database, web server), which have evolved into a full-fledged graphical tool for studying proteins in the proteome. ProBiS differs from other structural algorithms in that it can align proteins with different folds without prior knowledge of the binding sites. It allows detection of similar binding sites and can predict molecular ligands of various types of pharmaceutical interest that could be advanced to drugs to treat a disease, based on the entire Protein Data Bank (PDB) and AlphaFold database, including proteins not yet in the PDB. All ProBiS Tools are freely available to the academic community at http://insilab.org and https://probis.nih.gov.
Collapse
Affiliation(s)
- Janez Konc
- Theory Department, National Institute of Chemistry, Hajdrihova 19, SI-1001 Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, SI-6000 Koper, Slovenia
| |
Collapse
|
6
|
Mansoor M, Nauman M, Rehman HU, Omar M. Gene Ontology Capsule GAN: an improved architecture for protein function prediction. PeerJ Comput Sci 2022; 8:e1014. [PMID: 36092003 PMCID: PMC9454774 DOI: 10.7717/peerj-cs.1014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2022] [Accepted: 05/31/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the core of all functions pertaining to living things. They consist of an extended amino acid chain folding into a three-dimensional shape that dictates their behavior. Currently, convolutional neural networks (CNNs) have been pivotal in predicting protein functions based on protein sequences. While it is a technology crucial to the niche, the computation cost and translational invariance associated with CNN make it impossible to detect spatial hierarchies between complex and simpler objects. Therefore, this research utilizes capsule networks to capture spatial information as opposed to CNNs. Since capsule networks focus on hierarchical links, they have a lot of potential for solving structural biology challenges. In comparison to the standard CNNs, our results exhibit an improvement in accuracy. Gene Ontology Capsule GAN (GOCAPGAN) achieved an F1 score of 82.6%, a precision score of 90.4% and recall score of 76.1%.
Collapse
Affiliation(s)
- Musadaq Mansoor
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Mohammad Nauman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Hafeez Ur Rehman
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| | - Maryam Omar
- National University of Computer and Emerging Sciences, Islamabad, Peshawar, KPK, Pakistan
| |
Collapse
|
7
|
Yuan Q, Chen S, Rao J, Zheng S, Zhao H, Yang Y. AlphaFold2-aware protein-DNA binding site prediction using graph transformer. Brief Bioinform 2022; 23:6509729. [PMID: 35039821 DOI: 10.1093/bib/bbab564] [Citation(s) in RCA: 34] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2021] [Revised: 11/24/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open
Abstract
Protein-DNA interactions play crucial roles in the biological systems, and identifying protein-DNA binding sites is the first step for mechanistic understanding of various biological activities (such as transcription and repair) and designing novel drugs. How to accurately identify DNA-binding residues from only protein sequence remains a challenging task. Currently, most existing sequence-based methods only consider contextual features of the sequential neighbors, which are limited to capture spatial information. Based on the recent breakthrough in protein structure prediction by AlphaFold2, we propose an accurate predictor, GraphSite, for identifying DNA-binding residues based on the structural models predicted by AlphaFold2. Here, we convert the binding site prediction problem into a graph node classification task and employ a transformer-based variant model to take the protein structural information into account. By leveraging predicted protein structures and graph transformer, GraphSite substantially improves over the latest sequence-based and structure-based methods. The algorithm is further confirmed on the independent test set of 181 proteins, where GraphSite surpasses the state-of-the-art structure-based method by 16.4% in area under the precision-recall curve and 11.2% in Matthews correlation coefficient, respectively. We provide the datasets, the predicted structures and the source codes along with the pre-trained models of GraphSite at https://github.com/biomed-AI/GraphSite. The GraphSite web server is freely available at https://biomed.nscc-gz.cn/apps/GraphSite.
Collapse
Affiliation(s)
- Qianmu Yuan
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Sheng Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Jiahua Rao
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Shuangjia Zheng
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Huiying Zhao
- Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou 510000, China
| | - Yuedong Yang
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- Key Laboratory of Machine Intelligence and Advanced Computing of MOE, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
8
|
Zhang Z, Zhao Y, Wang J, Guo M. DeepRCI: predicting ATP-binding proteins using the residue-residue contact information. IEEE J Biomed Health Inform 2021; 26:2822-2829. [PMID: 34941538 DOI: 10.1109/jbhi.2021.3137840] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Adenine-5'-triphosphate (ATP) is a direct energy source for various activities of tissues and cells in the body. The release of ATP energies requires the assistance of ATP-binding proteins. Therefore, the identification of ATP-binding proteins is of great significance for the research on organisms. So far, there are several methods for predicting ATP-binding proteins. However, the accuracies of these methods are so low that the predicted proteins are inaccurate. Here, we designed a novel method, called as DeepRCI (based on Deep convolutional neural network and Residue-residue Contact Information), for predicting ATP-binding proteins. DeepRCI achieved an accuracy of 93.61\% on the test set which was a significant improvement over the state-of-the-art methods.
Collapse
|
9
|
Li S, Cai C, Gong J, Liu X, Li H. A fast protein binding site comparison algorithm for proteome-wide protein function prediction and drug repurposing. Proteins 2021; 89:1541-1556. [PMID: 34245187 DOI: 10.1002/prot.26176] [Citation(s) in RCA: 7] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2021] [Revised: 06/26/2021] [Accepted: 06/30/2021] [Indexed: 01/18/2023]
Abstract
The expansion of three-dimensional protein structures and enhanced computing power have significantly facilitated our understanding of protein sequence/structure/function relationships. A challenge in structural genomics is to predict the function of uncharacterized proteins. Protein function deconvolution based on global sequence or structural homology is impracticable when a protein relates to no other proteins with known function, and in such cases, functional relationships can be established by detecting their local ligand binding site similarity. Here, we introduce a sequence order-independent comparison algorithm, PocketShape, for structural proteome-wide exploration of protein functional site by fully considering the geometry of the backbones, orientation of the sidechains, and physiochemical properties of the pocket-lining residues. PocketShape is efficient in distinguishing similar from dissimilar ligand binding site pairs by retrieving 99.3% of the similar pairs while rejecting 100% of the dissimilar pairs on a dataset containing 1538 binding site pairs. This method successfully classifies 83 enzyme structures with diverse functions into 12 clusters, which is highly in accordance with the actual structural classification of proteins classification. PocketShape also achieves superior performances than other methods in protein profiling based on experimental data. Potential new applications for representative SARS-CoV-2 drugs Remdesivir and 11a are predicted. The high accuracy and time-efficient characteristics of PocketShape will undoubtedly make it a promising complementary tool for proteome-wide protein function inference and drug repurposing study.
Collapse
Affiliation(s)
- Shiliang Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Chaoqian Cai
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
| | - Jiayu Gong
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China
| | - Xiaofeng Liu
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China
| | - Honglin Li
- State Key Laboratory of Bioreactor Engineering, Shanghai Key Laboratory of New Drug Design, School of Pharmacy, East China University of Science and Technology, Shanghai, China.,School of Information Science and Engineering, East China University of Science and Technology, Shanghai, China.,Research and Development Department, Jiangzhong Pharmaceutical Co., Ltd., Nanchang, China
| |
Collapse
|
10
|
Kralj S, Hodošček M, Podobnik B, Kunej T, Bren U, Janežič D, Konc J. Molecular Dynamics Simulations Reveal Interactions of an IgG1 Antibody With Selected Fc Receptors. Front Chem 2021; 9:705931. [PMID: 34277572 PMCID: PMC8283507 DOI: 10.3389/fchem.2021.705931] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2021] [Accepted: 06/24/2021] [Indexed: 11/24/2022] Open
Abstract
In a survey of novel interactions between an IgG1 antibody and different Fcγ receptors (FcγR), molecular dynamics simulations were performed of interactions of monoclonal antibody involved complexes with FcγRs. Free energy simulations were also performed of isolated wild-type and substituted Fc regions bound to FcγRs with the aim of assessing their relative binding affinities. Two different free energy calculation methods, Molecular Mechanical/Generalized Born Molecular Volume (MM/GBMV) and Bennett Acceptance Ratio (BAR), were used to evaluate the known effector substitution G236A that is known to selectively increase antibody dependent cellular phagocytosis. The obtained results for the MM/GBMV binding affinity between different FcγRs are in good agreement with previous experiments, and those obtained using the BAR method for the complete antibody and the Fc-FcγR simulations show increased affinity across all FcγRs when binding to the substituted antibody. The FcγRIIa, a key determinant of antibody agonistic efficacy, shows a 10-fold increase in binding affinity, which is also consistent with the published experimental results. Novel interactions between the Fab region of the antibody and the FcγRs were discovered with this in silico approach, and provide insights into the antibody-FcγR binding mechanism and show promise for future improvements of therapeutic antibodies for preclinical studies of biological drugs.
Collapse
Affiliation(s)
- Sebastjan Kralj
- Theory Department, National Institute of Chemistry, Ljubljana, Slovenia.,Laboratory of Physical Chemistry and Chemical Thermodynamics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Maribor, Slovenia
| | - Milan Hodošček
- Theory Department, National Institute of Chemistry, Ljubljana, Slovenia
| | - Barbara Podobnik
- Biologics Technical Development Mengeš, Technical Research and Development Novartis, Lek Pharmaceuticals d.d., Mengeš, Slovenia
| | - Tanja Kunej
- Department of Animal Science, Biotechnical Faculty, University of Ljubljana, Ljubljana, Slovenia
| | - Urban Bren
- Laboratory of Physical Chemistry and Chemical Thermodynamics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Maribor, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Koper, Slovenia
| | - Janez Konc
- Theory Department, National Institute of Chemistry, Ljubljana, Slovenia.,Laboratory of Physical Chemistry and Chemical Thermodynamics, Faculty of Chemistry and Chemical Engineering, University of Maribor, Maribor, Slovenia
| |
Collapse
|
11
|
CAVIAR: a method for automatic cavity detection, description and decomposition into subcavities. J Comput Aided Mol Des 2021; 35:737-750. [PMID: 34050420 DOI: 10.1007/s10822-021-00390-w] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2021] [Accepted: 05/11/2021] [Indexed: 10/21/2022]
Abstract
The accurate description of protein binding sites is essential to the determination of similarity and the application of machine learning methods to relate the binding sites to observed functions. This work describes CAVIAR, a new open source tool for generating descriptors for binding sites, using protein structures in PDB and mmCIF format as well as trajectory frames from molecular dynamics simulations as input. The applicability of CAVIAR descriptors is showcased by computing machine learning predictions of binding site ligandability. The method can also automatically assign subcavities, even in the absence of a bound ligand. The defined subpockets mimic the empirical definitions used in medicinal chemistry projects. It is shown that the experimental binding affinity scales relatively well with the number of subcavities filled by the ligand, with compounds binding to more than three subcavities having nanomolar or better affinities to the target. The CAVIAR descriptors and methods can be used in any machine learning-based investigations of problems involving binding sites, from protein engineering to hit identification. The full software code is available on GitHub and a conda package is hosted on Anaconda cloud.
Collapse
|
12
|
Zhang Z, Wang J, Liu J. DeepRTCP: Predicting ATP-Binding Cassette Transporters Based on 1-Dimensional Convolutional Network. Front Cell Dev Biol 2021; 8:614080. [PMID: 33598454 PMCID: PMC7882686 DOI: 10.3389/fcell.2020.614080] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 12/24/2020] [Indexed: 11/13/2022] Open
Abstract
ATP-binding cassette (ABC) transporters can promote cells to absorb nutrients and excrete harmful substances. It plays a vital role in the transmembrane transport of macromolecules. Therefore, the identification of ABC transporters is of great significance for the biological research. This paper will introduce a novel method called DeepRTCP. DeepRTCP uses the deep convolutional neural network and a feature combined of reduced amino acid alphabet based tripeptide composition and PSSM to recognize ABC transporters. We constructed a dataset named ABC_2020. It contains the latest ABC transporters downloaded from Uniprot. We performed 10-fold cross-validation on DeepRTCP, and the average accuracy of DeepRTCP was 95.96%. Compared with the start-of-the-art method for predicting ABC transporters, DeepRTCP improved the accuracy by 9.29%. It is anticipated that DeepRTCP can be used as an effective ABC transporter classifier which provides a reliable guidance for the research of ABC transporters.
Collapse
Affiliation(s)
- Zhaoxi Zhang
- School of Computer Science, Inner Mongolia University, Hohhot, China
| | - Juan Wang
- School of Computer Science, Inner Mongolia University, Hohhot, China
- Stage Key Laboratories of Reproductive Regulation & Breeding of Grassland Livestock, Hohhot, China
| | - Jiameng Liu
- School of Computer Science, Inner Mongolia University, Hohhot, China
| |
Collapse
|
13
|
Eguida M, Rognan D. A Computer Vision Approach to Align and Compare Protein Cavities: Application to Fragment-Based Drug Design. J Med Chem 2020; 63:7127-7142. [DOI: 10.1021/acs.jmedchem.0c00422] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Merveille Eguida
- UMR 7200 CNRS-Université de Strasbourg, Laboratoire d’Innovation Thérapeutique, 67400 Illkirch, France
| | - Didier Rognan
- UMR 7200 CNRS-Université de Strasbourg, Laboratoire d’Innovation Thérapeutique, 67400 Illkirch, France
| |
Collapse
|
14
|
In Silico Laboratory: Tools for Similarity-Based Drug Discovery. Methods Mol Biol 2019. [PMID: 31773644 DOI: 10.1007/978-1-0716-0163-1_1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Computational methods that predict and evaluate binding of ligands to receptors implicated in different pathologies have become crucial in modern drug design and discovery. Here, we describe protocols for using the recently developed package of computational tools for similarity-based drug discovery. The ProBiS stand-alone program and web server allow superimposition of protein structures against large protein databases and predict ligands based on detected binding site similarities. GenProBiS allows mapping of human somatic missense mutations related to cancer and non-synonymous single nucleotide polymorphisms and subsequent visual exploration of specific interactions in connection to these mutations. We describe protocols for using LiSiCA, a fast ligand-based virtual screening software that enables easy screening of large databases containing billions of small molecules. Finally, we show the use of BoBER, a web interface that enables user-friendly access to a large database of bioisosteric and scaffold hopping replacements.
Collapse
|
15
|
Zhang F, Song H, Zeng M, Li Y, Kurgan L, Li M. DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions. Proteomics 2019; 19:e1900019. [PMID: 30941889 DOI: 10.1002/pmic.201900019] [Citation(s) in RCA: 52] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2019] [Revised: 03/18/2019] [Indexed: 01/06/2023]
Abstract
Annotation of protein functions plays an important role in understanding life at the molecular level. High-throughput sequencing produces massive numbers of raw proteins sequences and only about 1% of them have been manually annotated with functions. Experimental annotations of functions are expensive, time-consuming and do not keep up with the rapid growth of the sequence numbers. This motivates the development of computational approaches that predict protein functions. A novel deep learning framework, DeepFunc, is proposed which accurately predicts protein functions from protein sequence- and network-derived information. More precisely, DeepFunc uses a long and sparse binary vector to encode information concerning domains, families, and motifs collected from the InterPro tool that is associated with the input protein sequence. This vector is processed with two neural layers to obtain a low-dimensional vector which is combined with topological information extracted from protein-protein interactions (PPIs) and functional linkages. The combined information is processed by a deep neural network that predicts protein functions. DeepFunc is empirically and comparatively tested on a benchmark testing dataset and the Critical Assessment of protein Function Annotation algorithms (CAFA) 3 dataset. The experimental results demonstrate that DeepFunc outperforms current methods on the testing dataset and that it secures the highest Fmax = 0.54 and AUC = 0.94 on the CAFA3 dataset.
Collapse
Affiliation(s)
- Fuhao Zhang
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Hong Song
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Min Zeng
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| | - Yaohang Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China.,Department of Computer Science, Old Dominion University, Norfolk, VA, 23529, USA
| | - Lukasz Kurgan
- Department of Computer Science, Virginia Commonwealth University, Richmond, VA, 23284, USA
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha, 410083, P. R. China
| |
Collapse
|
16
|
Kulmanov M, Khan MA, Hoehndorf R, Wren J. DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier. Bioinformatics 2018; 34:660-668. [PMID: 29028931 PMCID: PMC5860606 DOI: 10.1093/bioinformatics/btx624] [Citation(s) in RCA: 212] [Impact Index Per Article: 35.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2017] [Accepted: 09/27/2017] [Indexed: 12/29/2022] Open
Abstract
Motivation A large number of protein sequences are becoming available through the application of novel high-throughput sequencing technologies. Experimental functional characterization of these proteins is time-consuming and expensive, and is often only done rigorously for few selected model organisms. Computational function prediction approaches have been suggested to fill this gap. The functions of proteins are classified using the Gene Ontology (GO), which contains over 40 000 classes. Additionally, proteins have multiple functions, making function prediction a large-scale, multi-class, multi-label problem. Results We have developed a novel method to predict protein function from sequence. We use deep learning to learn features from protein sequences as well as a cross-species protein–protein interaction network. Our approach specifically outputs information in the structure of the GO and utilizes the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and demonstrate a significant improvement over baseline methods such as BLAST, in particular for predicting cellular locations. Availability and implementation Web server: http://deepgo.bio2vec.net, Source code: https://github.com/bio-ontology-research-group/deepgo Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Mohammed Asif Khan
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences & Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal 23955-6900, Kingdom of Saudi Arabia
| | | |
Collapse
|
17
|
Ehrt C, Brinkjost T, Koch O. A benchmark driven guide to binding site comparison: An exhaustive evaluation using tailor-made data sets (ProSPECCTs). PLoS Comput Biol 2018; 14:e1006483. [PMID: 30408032 PMCID: PMC6224041 DOI: 10.1371/journal.pcbi.1006483] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 09/02/2018] [Indexed: 11/24/2022] Open
Abstract
The automated comparison of protein-ligand binding sites provides useful insights into yet unexplored site similarities. Various stages of computational and chemical biology research can benefit from this knowledge. The search for putative off-targets and the establishment of polypharmacological effects by comparing binding sites led to promising results for numerous projects. Although many cavity comparison methods are available, a comprehensive analysis to guide the choice of a tool for a specific application is wanting. Moreover, the broad variety of binding site modeling approaches, comparison algorithms, and scoring metrics impedes this choice. Herein, we aim to elucidate strengths and weaknesses of binding site comparison methodologies. A detailed benchmark study is the only possibility to rationalize the selection of appropriate tools for different scenarios. Specific evaluation data sets were developed to shed light on multiple aspects of binding site comparison. An assembly of all applied benchmark sets (ProSPECCTs–Protein Site Pairs for the Evaluation of Cavity Comparison Tools) is made available for the evaluation and optimization of further and still emerging methods. The results indicate the importance of such analyses to facilitate the choice of a methodology that complies with the requirements of a specific scientific challenge. Binding site similarities are useful in the context of promiscuity prediction, drug repurposing, the analysis of protein-ligand and protein-protein complexes, function prediction, and further fields of general interest in chemical biology and biochemistry. Many years of research have led to the development of a multitude of methods for binding site analysis and comparison. On the one hand, their availability supports research. On the other hand, the huge number of methods hampers the efficient selection of a specific tool. Our research is dedicated to the analysis of different cavity comparison tools. We use several binding site data sets to establish guidelines which can be applied to ensure a successful application of comparison methods by circumventing potential pitfalls.
Collapse
Affiliation(s)
- Christiane Ehrt
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany
| | - Tobias Brinkjost
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany
- Department of Computer Science, TU Dortmund University, Dortmund, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University, Dortmund, Germany
- * E-mail: ,
| |
Collapse
|
18
|
Ahmed MS, Shahjaman M, Kabir E, Kamruzzaman M. Structure modeling to function prediction of Uncharacterized Human Protein C15orf41. Bioinformation 2018; 14:206-212. [PMID: 30108417 PMCID: PMC6077826 DOI: 10.6026/97320630014206] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 04/29/2018] [Accepted: 04/29/2018] [Indexed: 01/18/2023] Open
Abstract
The dyserythropoietic anemia disease is a genetic disorder of erythropoiesis characterized by morphological abnormalities of erythroblasts. This is caused by human gene C15orf41 mutation. The uncharacterized C15orf41 protein is involved in the formation of a functional complex structure. The uncharacterized C15orf41 protein is thermostable, unstable and acidic. This is associated with TPD (Treponema Pallidum) domain (135 to 265 residue position) and three PTM sites such as K50 (Acetylation), T114 (Phosphorylation) and K176 (Ubiquitination). C15orf41 is paralogous to isoform-1 (gi|194018542|) and open reading frame isoform-CRA_c (gi|119612744|) of Homo sapiens located at chromosome 15. It interacts with the human ATP (Adenosine Triphosphate) binding domain 4 (ATPBD4) having similarity score 0.725 as per protein-protein interaction (PPI) network analysis. This data provides valuable insights towards the functional characterization of human gene C15orf41.
Collapse
Affiliation(s)
- Md. Shakil Ahmed
- Department of Statistics, University of Rajshahi, Rajshahi-6205, Bangladesh
| | - Md. Shahjaman
- Department of Statistics, Begum Rokeya University, Rangpur-5400, Bangladesh
| | - Enamul Kabir
- School of Agricultural, Computational and Environmental Sciences, University of Southern Queensland, Australia
| | - Md. Kamruzzaman
- Data Science for Knowledge Creation Research Center, Seoul National University, Korea
| |
Collapse
|
19
|
Flores-Bautista E, Cronick CL, Fersaca AR, Martinez-Nuñez MA, Perez-Rueda E. Functional Prediction of Hypothetical Transcription Factors of Escherichia coli K-12 Based on Expression Data. Comput Struct Biotechnol J 2018; 16:157-166. [PMID: 30050664 PMCID: PMC6055005 DOI: 10.1016/j.csbj.2018.03.003] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2017] [Revised: 03/08/2018] [Accepted: 03/20/2018] [Indexed: 11/29/2022] Open
Abstract
The repertoire of 304 DNA-binding transcription factors (TFs) in Escherichia coli K-12 has been described recently, with 196 TFs experimentally characterized and 108 proteins predicted by sequence comparisons. Based on 303 expression profile patterns retrieved from the Colombos database 12 clusters were identified, including hypothetical and experimentally characterized TFs, using a spectral clustering algorithm based on a 3NN graph built using 14 principal components that represent 65% of the variance of the expression data. In a posterior step, clusters were characterized in terms of their associated overrepresented functions, based on KEGG, Supfam annotations and Pfam assignments among other functional categories using an enrichment test, reinforcing the notion that the identified clusters are functionally similar among them. Based on these data, the we identified 12 clusters in which hypothetical and known TFs share similar regulatory and physiological functions, such as module associations of toxin-antitoxin (TA) systems with DNA repair mechanisms, amino acid biosynthesis, and carbon metabolism/transport, among others. This analysis has increased our knowledge about gene regulation in E. coli K-12 and can be further expanded to other organisms.
Collapse
Affiliation(s)
- Emanuel Flores-Bautista
- Facultad de Ingenieria Química, Universidad Autónoma de Yucatán, Mexico.,Laboratorio de Ecogenómica, Unidad Académica de Ciencias y Tecnología de Yucatán, Facultad de Ciencias, UNAM, Mérida, Yucatán, Mexico
| | | | | | - Mario Alberto Martinez-Nuñez
- Laboratorio de Ecogenómica, Unidad Académica de Ciencias y Tecnología de Yucatán, Facultad de Ciencias, UNAM, Mérida, Yucatán, Mexico
| | - Ernesto Perez-Rueda
- Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, Unidad Académica Yucatán, C.P. 97302 Mérida, Yucatán, Mexico.,Departamento de Ingenieria Celular y Biocatálisis, Instituto de Biotecnología, UNAM, Cuernavaca C.P. 62210, Morelos, Mexico
| |
Collapse
|
20
|
Gladovic M, Spaninger E, Bren U. Nucleic Bases Alkylation with Acrylonitrile and Cyanoethylene Oxide: A Computational Study. Chem Res Toxicol 2018; 31:97-104. [DOI: 10.1021/acs.chemrestox.7b00268] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Affiliation(s)
- Martin Gladovic
- Faculty
of Chemistry and Chemical Technology, University of Maribor, Smetanova
17, SI-2000 Maribor, Slovenia
- Faculty
of Chemistry and Chemical Technology, University of Ljubljana, Vecna pot
113, SI-1000 Ljubljana, Slovenia
| | - Eva Spaninger
- Faculty
of Chemistry and Chemical Technology, University of Maribor, Smetanova
17, SI-2000 Maribor, Slovenia
| | - Urban Bren
- Faculty
of Chemistry and Chemical Technology, University of Maribor, Smetanova
17, SI-2000 Maribor, Slovenia
- National Institute of Chemistry, Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| |
Collapse
|
21
|
ProBiS tools (algorithm, database, and web servers) for predicting and modeling of biologically interesting proteins. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2017; 128:24-32. [PMID: 28212856 DOI: 10.1016/j.pbiomolbio.2017.02.005] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/26/2016] [Revised: 12/14/2016] [Accepted: 02/07/2017] [Indexed: 01/30/2023]
Abstract
ProBiS (Protein Binding Sites) Tools consist of algorithm, database, and web servers for prediction of binding sites and protein ligands based on the detection of structurally similar binding sites in the Protein Data Bank. In this article, we review the operations that ProBiS Tools perform, provide comments on the evolution of the tools, and give some implementation details. We review some of its applications to biologically interesting proteins. ProBiS Tools are freely available at http://probis.cmm.ki.si and http://probis.nih.gov.
Collapse
|
22
|
Štular T, Lešnik S, Rožman K, Schink J, Zdouc M, Ghysels A, Liu F, Aldrich CC, Haupt VJ, Salentin S, Daminelli S, Schroeder M, Langer T, Gobec S, Janežič D, Konc J. Discovery of Mycobacterium tuberculosis InhA Inhibitors by Binding Sites Comparison and Ligands Prediction. J Med Chem 2016; 59:11069-11078. [PMID: 27936766 DOI: 10.1021/acs.jmedchem.6b01277] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Drug discovery is usually focused on a single protein target; in this process, existing compounds that bind to related proteins are often ignored. We describe ProBiS plugin, extension of our earlier ProBiS-ligands approach, which for a given protein structure allows prediction of its binding sites and, for each binding site, the ligands from similar binding sites in the Protein Data Bank. We developed a new database of precalculated binding site comparisons of about 290000 proteins to allow fast prediction of binding sites in existing proteins. The plugin enables advanced viewing of predicted binding sites, ligands' poses, and their interactions in three-dimensional graphics. Using the InhA query protein, an enoyl reductase enzyme in the Mycobacterium tuberculosis fatty acid biosynthesis pathway, we predicted its possible ligands and assessed their inhibitory activity experimentally. This resulted in three previously unrecognized inhibitors with novel scaffolds, demonstrating the plugin's utility in the early drug discovery process.
Collapse
Affiliation(s)
- Tanja Štular
- National Institute of Chemistry , Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | - Samo Lešnik
- National Institute of Chemistry , Hajdrihova 19, SI-1000 Ljubljana, Slovenia
| | - Kaja Rožman
- Faculty of Pharmacy, University of Ljubljana , Aškerčeva cesta 7, SI-1000 Ljubljana, Slovenia
| | - Julia Schink
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska , Glagoljaška 8, SI-6000 Koper, Slovenia
| | - Mitja Zdouc
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska , Glagoljaška 8, SI-6000 Koper, Slovenia
| | - An Ghysels
- Center for Molecular Modeling, Ghent University , Technologiepark 903, 9052 Zwijnaarde, Belgium
| | - Feng Liu
- AAT Bioquest, Inc. , 520 Mercury Drive, Sunnyvale, California 94085, United States
| | - Courtney C Aldrich
- Department of Medicinal Chemistry, University of Minnesota , 308 Harvard Street Southeast, Minneapolis, Minnesota 55455, United States
| | - V Joachim Haupt
- Biotechnology Center (BIOTEC), Technische Universität Dresden , 01307 Dresden, Germany
| | - Sebastian Salentin
- Biotechnology Center (BIOTEC), Technische Universität Dresden , 01307 Dresden, Germany
| | - Simone Daminelli
- Biotechnology Center (BIOTEC), Technische Universität Dresden , 01307 Dresden, Germany
| | - Michael Schroeder
- Biotechnology Center (BIOTEC), Technische Universität Dresden , 01307 Dresden, Germany
| | - Thierry Langer
- Department of Pharmaceutical Chemistry, Faculty of Life Sciences, University of Vienna , Althanstrasse 14, A-1090 Vienna, Austria
| | - Stanislav Gobec
- Faculty of Pharmacy, University of Ljubljana , Aškerčeva cesta 7, SI-1000 Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska , Glagoljaška 8, SI-6000 Koper, Slovenia
| | - Janez Konc
- National Institute of Chemistry , Hajdrihova 19, SI-1000 Ljubljana, Slovenia.,Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska , Glagoljaška 8, SI-6000 Koper, Slovenia
| |
Collapse
|
23
|
Structure to function of an α-glucan metabolic pathway that promotes Listeria monocytogenes pathogenesis. Nat Microbiol 2016; 2:16202. [PMID: 27819654 DOI: 10.1038/nmicrobiol.2016.202] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2016] [Accepted: 09/12/2016] [Indexed: 11/08/2022]
Abstract
Here we employ a 'systems structural biology' approach to functionally characterize an unconventional α-glucan metabolic pathway from the food-borne pathogen Listeria monocytogenes (Lm). Crystal structure determination coupled with basic biochemical and biophysical assays allowed for the identification of anabolic, transport, catabolic and regulatory portions of the cycloalternan pathway. These findings provide numerous insights into cycloalternan pathway function and reveal the mechanism of repressor, open reading frame, kinase (ROK) transcription regulators. Moreover, by developing a structural overview we were able to anticipate the cycloalternan pathway's role in the metabolism of partially hydrolysed starch derivatives and demonstrate its involvement in Lm pathogenesis. These findings suggest that the cycloalternan pathway plays a role in interspecies resource competition-potentially within the host gastrointestinal tract-and establish the methodological framework for characterizing bacterial systems of unknown function.
Collapse
|
24
|
Ogrizek M, Konc J, Bren U, Hodošček M, Janežič D. Role of magnesium ions in the reaction mechanism at the interface between Tm1631 protein and its DNA ligand. Chem Cent J 2016; 10:41. [PMID: 27398092 PMCID: PMC4939058 DOI: 10.1186/s13065-016-0188-6] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2016] [Accepted: 06/27/2016] [Indexed: 12/24/2022] Open
Abstract
A protein, Tm1631 from the hyperthermophilic organism Thermotoga maritima belongs to a domain of unknown function protein family. It was predicted that Tm1631 binds with the DNA and that the Tm1631–DNA complex is an endonuclease repair system with a DNA repair function (Konc et al. PLoS Comput Biol 9(11): e1003341, 2013). We observed that the severely bent, strained DNA binds to the protein for the entire 90 ns of classical molecular dynamics (MD) performed; we could observe no significant changes in the most distorted region of the DNA, where the cleavage of phosphodiester bond occurs. In this article, we modeled the reaction mechanism at the interface between Tm1631 and its proposed ligand, the DNA molecule, focusing on cleavage of the phosphodiester bond. After addition of two Mg2+ ions to the reaction center and extension of classical MD by 50 ns (totaling 140 ns), the DNA ligand stayed bolted to the protein. Results from density functional theory quantum mechanics/molecular mechanics (QM/MM) calculations suggest that the reaction is analogous to known endonuclease mechanisms: an enzyme reaction mechanism with two Mg2+ ions in the reaction center and a pentacovalent intermediate. The minimum energy pathway profile shows that the phosphodiester bond cleavage step of the reaction is kinetically controlled and not thermodynamically because of a lack of any energy barrier above the accuracy of the energy profile calculation. The role of ions is shown by comparing the results with the reaction mechanisms in the absence of the Mg2+ ions where there is a significantly higher reaction barrier than in the presence of the Mg2+ ions.A protein, Tm1631 from the hyperthermophilic organism Thermotoga maritima belongs to a domain of unknown function protein family. We modeled the reaction mechanism at the interface between Tm1631 and its proposed ligand, the DNA molecule, focusing on cleavage of the phosphodiester bond ![]()
Collapse
Affiliation(s)
- Mitja Ogrizek
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Janez Konc
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia ; Laboratory for Physical Chemistry and Thermodynamics, Faculty of Chemistry and Chemical Technology, University of Maribor, Smetanova ulica 17, 2000 Maribor, Slovenia ; Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 6000 Koper, Slovenia
| | - Urban Bren
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia ; Laboratory for Physical Chemistry and Thermodynamics, Faculty of Chemistry and Chemical Technology, University of Maribor, Smetanova ulica 17, 2000 Maribor, Slovenia ; Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 6000 Koper, Slovenia
| | - Milan Hodošček
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Dušanka Janežič
- Faculty of Mathematics, Natural Sciences and Information Technologies, University of Primorska, Glagoljaška 8, 6000 Koper, Slovenia
| |
Collapse
|
25
|
Lobb B, Doxey AC. Novel function discovery through sequence and structural data mining. Curr Opin Struct Biol 2016; 38:53-61. [DOI: 10.1016/j.sbi.2016.05.017] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2016] [Revised: 05/17/2016] [Accepted: 05/24/2016] [Indexed: 01/30/2023]
|
26
|
Ehrt C, Brinkjost T, Koch O. Impact of Binding Site Comparisons on Medicinal Chemistry and Rational Molecular Design. J Med Chem 2016; 59:4121-51. [PMID: 27046190 DOI: 10.1021/acs.jmedchem.6b00078] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Modern rational drug design not only deals with the search for ligands binding to interesting and promising validated targets but also aims to identify the function and ligands of yet uncharacterized proteins having impact on different diseases. Additionally, it contributes to the design of inhibitors with distinct selectivity patterns and the prediction of possible off-target effects. The identification of similarities between binding sites of various proteins is a useful approach to cope with those challenges. The main scope of this perspective is to describe applications of different protein binding site comparison approaches to outline their applicability and impact on molecular design. The article deals with various substantial application domains and provides some outstanding examples to show how various binding site comparison methods can be applied to promote in silico drug design workflows. In addition, we will also briefly introduce the fundamental principles of different protein binding site comparison methods.
Collapse
Affiliation(s)
- Christiane Ehrt
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Tobias Brinkjost
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany.,Department of Computer Science, TU Dortmund University , Otto-Hahn-Straße 14, 44224 Dortmund, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| |
Collapse
|
27
|
Wang J, Luttrell J, Zhang N, Khan S, Shi N, Wang MX, Kang JQ, Wang Z, Xu D. Exploring Human Diseases and Biological Mechanisms by Protein Structure Prediction and Modeling. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2016; 939:39-61. [PMID: 27807743 PMCID: PMC6829626 DOI: 10.1007/978-981-10-1503-8_3] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Abstract
Protein structure prediction and modeling provide a tool for understanding protein functions by computationally constructing protein structures from amino acid sequences and analyzing them. With help from protein prediction tools and web servers, users can obtain the three-dimensional protein structure models and gain knowledge of functions from the proteins. In this chapter, we will provide several examples of such studies. As an example, structure modeling methods were used to investigate the relation between mutation-caused misfolding of protein and human diseases including epilepsy and leukemia. Protein structure prediction and modeling were also applied in nucleotide-gated channels and their interaction interfaces to investigate their roles in brain and heart cells. In molecular mechanism studies of plants, rice salinity tolerance mechanism was studied via structure modeling on crucial proteins identified by systems biology analysis; trait-associated protein-protein interactions were modeled, which sheds some light on the roles of mutations in soybean oil/protein content. In the age of precision medicine, we believe protein structure prediction and modeling will play more and more important roles in investigating biomedical mechanism of diseases and drug design.
Collapse
Affiliation(s)
- Juexin Wang
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
| | - Joseph Luttrell
- School of Computing, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Ning Zhang
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA
| | - Saad Khan
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA
| | - NianQing Shi
- Department of Medicine, Division of Cardiovascular Medicine, University of Wisconsin, Room 8418, 1111 Highland Ave, Madison, WI, 53706, USA
| | - Michael X Wang
- Department of Pathology and Anatomical Sciences, University of Missouri, Columbia, MO, 65211, USA
| | - Jing-Qiong Kang
- Department of Neurology, Vanderbilt University Medical Center, Nashville, TN, 37232, USA
| | - Zheng Wang
- School of Computing, University of Southern Mississippi, 118 College Drive, Hattiesburg, MS, 39406, USA
| | - Dong Xu
- Department of Computer Science, University of Missouri, Columbia, MO, 65211, USA.
- Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, 65211, USA.
- Informatics Institute, University of Missouri, Columbia, MO, 65211, USA.
| |
Collapse
|
28
|
Abstract
Enzymes are one of the most important groups of drug targets, and identifying possible ligand-enzyme interactions is of major importance in many drug discovery processes. Novel computational methods have been developed that can apply the information from the increasing number of resolved and available ligand-enzyme complexes to model new unknown interactions and therefore contribute to answer open questions in the field of drug discovery like the identification of unknown protein functions, off-target binding, ligand 3D homology modeling and induced-fit simulations.
Collapse
|
29
|
Rout S, Warhurst DC, Suar M, Mahapatra RK. In silico comparative genomics analysis of Plasmodium falciparum for the identification of putative essential genes and therapeutic candidates. J Microbiol Methods 2015; 109:1-8. [DOI: 10.1016/j.mimet.2014.11.016] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2014] [Revised: 11/27/2014] [Accepted: 11/27/2014] [Indexed: 01/17/2023]
|
30
|
Li HD, Menon R, Omenn GS, Guan Y. The emerging era of genomic data integration for analyzing splice isoform function. Trends Genet 2014; 30:340-7. [PMID: 24951248 DOI: 10.1016/j.tig.2014.05.005] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2014] [Revised: 05/21/2014] [Accepted: 05/23/2014] [Indexed: 01/17/2023]
Abstract
The vast majority of multi-exon genes in humans undergo alternative splicing, which greatly increases the functional diversity of protein species. Predicting functions at the isoform level is essential to further our understanding of developmental abnormalities and cancers, which frequently exhibit aberrant splicing and dysregulation of isoform expression. However, determination of isoform function is very difficult, and efforts to predict isoform function have been limited in the functional genomics field. Deep sequencing of RNA now provides an unprecedented amount of expression data at the transcript level. We describe here emerging computational approaches that integrate such large-scale whole-transcriptome sequencing (RNA-seq) data for predicting the functions of alternatively spliced isoforms, and we discuss their applications in developmental and cancer biology. We outline future directions for isoform function prediction, emphasizing the need for heterogeneous genomic data integration and tissue-specific, dynamic isoform-level network modeling, which will allow the field to realize its full potential.
Collapse
Affiliation(s)
- Hong-Dong Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Rajasree Menon
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
| | - Gilbert S Omenn
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, MI, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA; Department of Internal Medicine, University of Michigan, Ann Arbor, Michigan, MI, USA; Department of Electrical Engineering and Computer Science, Ann Arbor, MI, USA.
| |
Collapse
|
31
|
Konc J, Janežič D. ProBiS-ligands: a web server for prediction of ligands by examination of protein binding sites. Nucleic Acids Res 2014; 42:W215-20. [PMID: 24861616 PMCID: PMC4086080 DOI: 10.1093/nar/gku460] [Citation(s) in RCA: 55] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
The ProBiS-ligands web server predicts binding of ligands to a protein structure. Starting with a protein structure or binding site, ProBiS-ligands first identifies template proteins in the Protein Data Bank that share similar binding sites. Based on the superimpositions of the query protein and the similar binding sites found, the server then transposes the ligand structures from those sites to the query protein. Such ligand prediction supports many activities, e.g. drug repurposing. The ProBiS-ligands web server, an extension of the ProBiS web server, is open and free to all users at http://probis.cmm.ki.si/ligands.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Hajdrihova 19, 1000 Ljubljana, Slovenia
| | - Dušanka Janežič
- University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Glagoljaška 8, 6000 Koper, Slovenia
| |
Collapse
|
32
|
Konc J, Janežič D. Binding site comparison for function prediction and pharmaceutical discovery. Curr Opin Struct Biol 2013; 25:34-9. [PMID: 24878342 DOI: 10.1016/j.sbi.2013.11.012] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Revised: 11/26/2013] [Accepted: 11/27/2013] [Indexed: 11/30/2022]
Abstract
While structural genomics resulted in thousands of new protein crystal structures, we still do not know the functions of most of these proteins. One reason for this shortcoming is their unique sequences or folds, which leaves them assigned as proteins of 'unknown function'. Recent advances in and applications of cutting edge binding site comparison algorithms for binding site detection and function prediction have begun to shed light on this problem. Here, we review these algorithms and their use in function prediction and pharmaceutical discovery. Finding common binding sites in weakly related proteins may lead to the discovery of new protein functions and to novel ways of drug discovery.
Collapse
Affiliation(s)
- Janez Konc
- National Institute of Chemistry, Ljubljana, Slovenia
| | - Dušanka Janežič
- University of Primorska, Faculty of Mathematics, Natural Sciences and Information Technologies, Koper, Slovenia.
| |
Collapse
|