1
|
Basit A, Choudhury D, Bandyopadhyay P. Prediction of Ca 2+ Binding Site in Proteins With a Fast and Accurate Method Based on Statistical Mechanics and Analysis of Crystal Structures. Proteins 2024. [PMID: 39258438 DOI: 10.1002/prot.26743] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2024] [Revised: 08/20/2024] [Accepted: 08/26/2024] [Indexed: 09/12/2024]
Abstract
Predicting the precise locations of metal binding sites within metalloproteins is a crucial challenge in biophysics. A fast, accurate, and interpretable computational prediction method can complement the experimental studies. In the current work, we have developed a method to predict the location of Ca2+ ions in calcium-binding proteins using a physics-based method with an all-atom description of the proteins, which is substantially faster than the molecular dynamics simulation-based methods with accuracy as good as data-driven approaches. Our methodology uses the three-dimensional reference interaction site model (3D-RISM), a statistical mechanical theory, to calculate Ca2+ ion density around protein structures, and the locations of the Ca2+ ions are obtained from the density. We have taken previously used datasets to assess the efficacy of our method as compared to previous works. Our accuracy is 88%, comparable with the FEATURE program, one of the well-known data-driven methods. Moreover, our method is physical, and the reasons for failures can be ascertained in most cases. We have thoroughly examined the failed cases using different structural and crystallographic measures, such as B-factor, R-factor, electron density map, and geometry at the binding site. It has been found that x-ray structures have issues in many of the failed cases, such as geometric irregularities and dubious assignment of ion positions. Our algorithm, along with the checks for structural accuracy, is a major step in predicting calcium ion positions in metalloproteins.
Collapse
Affiliation(s)
- Abdul Basit
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| | | | - Pradipta Bandyopadhyay
- School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, India
| |
Collapse
|
2
|
Derry A, Altman RB. Explainable protein function annotation using local structure embeddings. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.10.13.562298. [PMID: 37905033 PMCID: PMC10614799 DOI: 10.1101/2023.10.13.562298] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/02/2023]
Abstract
The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to associate global function reliably to the specific residues responsible for that function. We address this issue by introducing PARSE (Protein Annotation by Residue-Specific Enrichment), a knowledge-based method which combines pre-trained embeddings of local structural environments with traditional statistical techniques to identify enriched functions with residue-level explainability. For the task of predicting the catalytic function of enzymes, PARSE achieves comparable or superior global performance to state-of-the-art machine learning methods (F1 score > 85%) while simultaneously annotating the specific residues involved in each function with much greater precision. Since it does not require supervised training, our method can make one-shot predictions for very rare functions and is not limited to a particular type of functional label (e.g. Enzyme Commission numbers or Gene Ontology codes). Finally, we leverage the AlphaFold Structure Database to perform functional annotation at a proteome scale. By applying PARSE to the dark proteome-predicted structures which cannot be classified into known structural families-we predict several novel bacterial metalloproteases. Each of these proteins shares a strongly conserved catalytic site despite highly divergent sequences and global folds, illustrating the value of local structure representations for new function discovery.
Collapse
Affiliation(s)
- Alexander Derry
- Department of Biomedical Data Science, Stanford University, Stanford, CA
| | - Russ B Altman
- Department of Biomedical Data Science, Stanford University, Stanford, CA
- Departments of Bioengineering, Genetics, and Medicine, Stanford University, Stanford, CA
| |
Collapse
|
3
|
Zhang J, Zhou F, Liang X, Yang G. SCAMPER: Accurate Type-Specific Prediction of Calcium-Binding Residues Using Sequence-Derived Features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1406-1416. [PMID: 35536812 DOI: 10.1109/tcbb.2022.3173437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Understanding molecular mechanisms involved in calcium-protein interactions and modeling corresponding docking rely on the accurate identification of calcium-binding residues (CaBRs). The defects of experimentally annotating protein functions enhances the development of computational approaches that correctly identify calcium-binding interactions. Studies have reported that current methods severely cross-predict residues that interact with other types of molecules (e.g., nucleic acids, proteins, and small ligands) as CaBRs. In this study, a novel predictor named SCAMPER (Selective CAlciuM-binding PrEdictoR) is proposed for the accurate and specific prediction of CaBRs. SCAMPER is designed using newly compiled dataset with complete UniProt sequences and annotations, which include calcium-binding, nucleic acid-binding, protein-binding, and small ligand-binding residues. We use a novel designed two-layer scheme to perform predictions as well as penalize cross-predictions. Empirical tests on an independent test dataset reveals that the proposed method significantly outperforms state-of-the-art predictors. SCAMPER is proved to be capable of distinguishing CaBRs from different types of metal-ion binding residues. We further perform CaBRs predictions on the whole human proteome, and use the results to hypothesize calcium-binding proteins (CaBPs). The latest experimental verified CaBPs and GO analysis prove the accuracy of our predictions. We implement the proposed method and share the data at http://www.inforstation.com/webservers/SCAMPER/.
Collapse
|
4
|
Esperante S, Alvarez-Paggi D, Salgueiro M, Desimone M, de Oliveira G, Arán M, García-Pardo J, Aptekmann A, Ventura S, Alonso L, de Prat-Gay G. A finely tuned interplay between calcium binding, ionic strength and pH modulates conformational and oligomerization equilibria in the Respiratory Syncytial Virus Matrix (M) protein. Arch Biochem Biophys 2022; 731:109424. [DOI: 10.1016/j.abb.2022.109424] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2022] [Revised: 09/14/2022] [Accepted: 09/29/2022] [Indexed: 11/30/2022]
|
5
|
A Comprehensive Review of Computation-Based Metal-Binding Prediction Approaches at the Residue Level. BIOMED RESEARCH INTERNATIONAL 2022; 2022:8965712. [PMID: 35402609 PMCID: PMC8989566 DOI: 10.1155/2022/8965712] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/02/2022] [Accepted: 03/04/2022] [Indexed: 12/29/2022]
Abstract
Clear evidence has shown that metal ions strongly connect and delicately tune the dynamic homeostasis in living bodies. They have been proved to be associated with protein structure, stability, regulation, and function. Even small changes in the concentration of metal ions can shift their effects from natural beneficial functions to harmful. This leads to degenerative diseases, malignant tumors, and cancers. Accurate characterizations and predictions of metalloproteins at the residue level promise informative clues to the investigation of intrinsic mechanisms of protein-metal ion interactions. Compared to biophysical or biochemical wet-lab technologies, computational methods provide open web interfaces of high-resolution databases and high-throughput predictors for efficient investigation of metal-binding residues. This review surveys and details 18 public databases of metal-protein binding. We collect a comprehensive set of 44 computation-based methods and classify them into four categories, namely, learning-, docking-, template-, and meta-based methods. We analyze the benchmark datasets, assessment criteria, feature construction, and algorithms. We also compare several methods on two benchmark testing datasets and include a discussion about currently publicly available predictive tools. Finally, we summarize the challenges and underlying limitations of the current studies and propose several prospective directions concerning the future development of the related databases and methods.
Collapse
|
6
|
A deep learning framework to predict binding preference of RNA constituents on protein surface. Nat Commun 2019; 10:4941. [PMID: 31666519 PMCID: PMC6821705 DOI: 10.1038/s41467-019-12920-0] [Citation(s) in RCA: 58] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2019] [Accepted: 10/08/2019] [Indexed: 12/21/2022] Open
Abstract
Protein-RNA interaction plays important roles in post-transcriptional regulation. However, the task of predicting these interactions given a protein structure is difficult. Here we show that, by leveraging a deep learning model NucleicNet, attributes such as binding preference of RNA backbone constituents and different bases can be predicted from local physicochemical characteristics of protein structure surface. On a diverse set of challenging RNA-binding proteins, including Fem-3-binding-factor 2, Argonaute 2 and Ribonuclease III, NucleicNet can accurately recover interaction modes discovered by structural biology experiments. Furthermore, we show that, without seeing any in vitro or in vivo assay data, NucleicNet can still achieve consistency with experiments, including RNAcompete, Immunoprecipitation Assay, and siRNA Knockdown Benchmark. NucleicNet can thus serve to provide quantitative fitness of RNA sequences for given binding pockets or to predict potential binding pockets and binding RNAs for previously unknown RNA binding proteins.
Collapse
|
7
|
Lo YC, Liu T, Morrissey KM, Kakiuchi-Kiyota S, Johnson AR, Broccatelli F, Zhong Y, Joshi A, Altman RB. Computational analysis of kinase inhibitor selectivity using structural knowledge. Bioinformatics 2019; 35:235-242. [PMID: 29985971 DOI: 10.1093/bioinformatics/bty582] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2018] [Accepted: 07/05/2018] [Indexed: 12/11/2022] Open
Abstract
Motivation Kinases play a significant role in diverse disease signaling pathways and understanding kinase inhibitor selectivity, the tendency of drugs to bind to off-targets, remains a top priority for kinase inhibitor design and clinical safety assessment. Traditional approaches for kinase selectivity analysis using biochemical activity and binding assays are useful but can be costly and are often limited by the kinases that are available. On the other hand, current computational kinase selectivity prediction methods are computational intensive and can rarely achieve sufficient accuracy for large-scale kinome wide inhibitor selectivity profiling. Results Here, we present a KinomeFEATURE database for kinase binding site similarity search by comparing protein microenvironments characterized using diverse physiochemical descriptors. Initial selectivity prediction of 15 known kinase inhibitors achieved an >90% accuracy and demonstrated improved performance in comparison to commonly used kinase inhibitor selectivity prediction methods. Additional kinase ATP binding site similarity assessment (120 binding sites) identified 55 kinases with significant promiscuity and revealed unexpected inhibitor cross-activities between PKR and FGFR2 kinases. Kinome-wide selectivity profiling of 11 kinase drug candidates predicted novel as well as experimentally validated off-targets and suggested structural mechanisms of kinase cross-activities. Our study demonstrated potential utilities of our approach for large-scale kinase inhibitor selectivity profiling that could contribute to kinase drug development and safety assessment. Availability and implementation The KinomeFEATURE database and the associated scripts for performing kinase pocket similarity search can be downloaded from the Stanford SimTK website (https://simtk.org/projects/kdb). Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Chen Lo
- Department of Bioengineering, Stanford, CA, USA
| | - Tianyun Liu
- Department of Bioengineering, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA
| | - Kari M Morrissey
- Department of Clinical Pharmacology, South San Francisco, CA, USA
| | | | - Adam R Johnson
- Biochemical and Cellular Pharmacology, South San Francisco, CA, USA
| | - Fabio Broccatelli
- Department of Drug Metabolism and Pharmacokinetic, Genentech Inc., South San Francisco, CA, USA
| | - Yu Zhong
- Department of Safety Assessment, South San Francisco, CA, USA
| | - Amita Joshi
- Department of Clinical Pharmacology, South San Francisco, CA, USA
| | - Russ B Altman
- Department of Bioengineering, Stanford, CA, USA.,Department of Genetics, Stanford University, Stanford, CA, USA
| |
Collapse
|
8
|
Dyrka W, Pyzik M, Coste F, Talibart H. Estimating probabilistic context-free grammars for proteins using contact map constraints. PeerJ 2019; 7:e6559. [PMID: 30918754 PMCID: PMC6428041 DOI: 10.7717/peerj.6559] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2018] [Accepted: 02/03/2019] [Indexed: 02/04/2023] Open
Abstract
Interactions between amino acids that are close in the spatial structure, but not necessarily in the sequence, play important structural and functional roles in proteins. These non-local interactions ought to be taken into account when modeling collections of proteins. Yet the most popular representations of sets of related protein sequences remain the profile Hidden Markov Models. By modeling independently the distributions of the conserved columns from an underlying multiple sequence alignment of the proteins, these models are unable to capture dependencies between the protein residues. Non-local interactions can be represented by using more expressive grammatical models. However, learning such grammars is difficult. In this work, we propose to use information on protein contacts to facilitate the training of probabilistic context-free grammars representing families of protein sequences. We develop the theory behind the introduction of contact constraints in maximum-likelihood and contrastive estimation schemes and implement it in a machine learning framework for protein grammars. The proposed framework is tested on samples of protein motifs in comparison with learning without contact constraints. The evaluation shows high fidelity of grammatical descriptors to protein structures and improved precision in recognizing sequences. Finally, we present an example of using our method in a practical setting and demonstrate its potential beyond the current state of the art by creating a grammatical model of a meta-family of protein motifs. We conclude that the current piece of research is a significant step towards more flexible and accurate modeling of collections of protein sequences. The software package is made available to the community.
Collapse
Affiliation(s)
- Witold Dyrka
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| | - Mateusz Pyzik
- Wydział Podstawowych Problemów Techniki, Katedra Inżynierii Biomedycznej, Politechnika Wrocławska, Wrocław, Poland
| | | | | |
Collapse
|
9
|
Pocket similarity identifies selective estrogen receptor modulators as microtubule modulators at the taxane site. Nat Commun 2019; 10:1033. [PMID: 30833575 PMCID: PMC6399299 DOI: 10.1038/s41467-019-08965-w] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2018] [Accepted: 01/19/2019] [Indexed: 02/01/2023] Open
Abstract
Taxanes are a family of natural products with a broad spectrum of anticancer activity. This activity is mediated by interaction with the taxane site of beta-tubulin, leading to microtubule stabilization and cell death. Although widely used in the treatment of breast cancer and other malignancies, existing taxane-based therapies including paclitaxel and the second-generation docetaxel are currently limited by severe adverse effects and dose-limiting toxicity. To discover taxane site modulators, we employ a computational binding site similarity screen of > 14,000 drug-like pockets from PDB, revealing an unexpected similarity between the estrogen receptor and the beta-tubulin taxane binding pocket. Evaluation of nine selective estrogen receptor modulators (SERMs) via cellular and biochemical assays confirms taxane site interaction, microtubule stabilization, and cell proliferation inhibition. Our study demonstrates that SERMs can modulate microtubule assembly and raises the possibility of an estrogen receptor-independent mechanism for inhibiting cell proliferation. Taxanes are natural products which bind beta-tubulin, stabilize microtubules and have a broad spectrum of anticancer activity. Here authors employ a computational binding site similarity screen and cell-based assays to reveal a SERM cross-reactivity between the estrogen receptor and the beta-tubulin taxane binding pocket.
Collapse
|
10
|
A density functional theory investigation of the interaction of the tetraaqua calcium cation with bidentate carbonyl ligands. J Mol Model 2017; 23:60. [DOI: 10.1007/s00894-017-3240-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 01/13/2017] [Indexed: 10/20/2022]
|
11
|
Jallouli R, Parsiegla G, Carrière F, Gargouri Y, Bezzine S. Efficient heterologous expression of Fusarium solani lipase, FSL2, in Pichia pastoris, functional characterization of the recombinant enzyme and molecular modeling. Int J Biol Macromol 2016; 94:61-71. [PMID: 27620466 DOI: 10.1016/j.ijbiomac.2016.09.030] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Revised: 09/05/2016] [Accepted: 09/08/2016] [Indexed: 11/25/2022]
Abstract
The gene coding for a lipase of Fusarium solani, designated as FSL2, shows an open reading frame of 906bp encoding a 301-amino acid polypeptide with a molecular mass of 30kDa. Based on sequence similarity with other fungal lipases, FSL2 contains a catalytic triad, consisting of Ser144, Asp198, and His256. FSL2 cDNA was subcloned into the pGAPZαA vector containing the Saccharomyces cerevisiae α-factor signal sequence and this construct was used to transform Pichia pastoris and achieve a high-level extracellular production of a FSL2 lipase. Maximum lipase activity was observed after 48h. The optimum activity of the purified recombinant enzyme was measured at pH 8.0-9.0 and 37°C. FSL2 is remarkably stable at alkaline pH values up to 12 and at temperatures below 40°C. It has high catalytic efficiency towards triglycerides with short to long chain fatty acids but with a marked preference for medium and long chain fatty acids. FSL2 activity is decreased at sodium taurodeoxycholate concentrations above the Critical Micelle Concentration (CMC) of this anionic detergent. However, lipase activity is enhanced by Ca2+ and inhibited by EDTA or Cu2+ and partially by Mg2+ or K+. In silico docking of medium chain triglycerides, monogalctolipids (MGDG), digalactolipids (DGDG) and long chain phospholipids in the active site of FSL2 reveals structural solutions.
Collapse
Affiliation(s)
- Raida Jallouli
- University of Sfax, Laboratoire de Biochimie et de Génie Enzymatique des Lipases, ENIS route de Soukra, BPW 3038 Sfax, Tunisie
| | - Goetz Parsiegla
- CNRS, Aix Marseille Université, Enzymologie Interfaciale et Physiologie de la Lipolyse UMR7282, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France
| | - Frédéric Carrière
- CNRS, Aix Marseille Université, Enzymologie Interfaciale et Physiologie de la Lipolyse UMR7282, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France
| | - Youssef Gargouri
- University of Sfax, Laboratoire de Biochimie et de Génie Enzymatique des Lipases, ENIS route de Soukra, BPW 3038 Sfax, Tunisie
| | - Sofiane Bezzine
- University of Sfax, Laboratoire de Biochimie et de Génie Enzymatique des Lipases, ENIS route de Soukra, BPW 3038 Sfax, Tunisie.
| |
Collapse
|
12
|
Minimal Functional Sites in Metalloproteins and Their Usage in Structural Bioinformatics. Int J Mol Sci 2016; 17:ijms17050671. [PMID: 27153067 PMCID: PMC4881497 DOI: 10.3390/ijms17050671] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2016] [Revised: 04/18/2016] [Accepted: 04/28/2016] [Indexed: 12/12/2022] Open
Abstract
Metal ions play a functional role in numerous biochemical processes and cellular pathways. Indeed, about 40% of all enzymes of known 3D structure require a metal ion to be able to perform catalysis. The interactions of the metals with the macromolecular framework determine their chemical properties and reactivity. The relevant interactions involve both the coordination sphere of the metal ion and the more distant interactions of the so-called second sphere, i.e., the non-bonded interactions between the macromolecule and the residues coordinating the metal (metal ligands). The metal ligands and the residues in their close spatial proximity define what we call a minimal functional site (MFS). MFSs can be automatically extracted from the 3D structures of metal-binding biological macromolecules deposited in the Protein Data Bank (PDB). They are 3D templates that describe the local environment around a metal ion or metal cofactor and do not depend on the overall macromolecular structure. MFSs provide a different view on metal-binding proteins and nucleic acids, completely focused on the metal. Here we present different protocols and tools based upon the concept of MFS to obtain deeper insight into the structural and functional properties of metal-binding macromolecules. We also show that structure conservation of MFSs in metalloproteins relates to local sequence similarity more strongly than to overall protein similarity.
Collapse
|