1
|
Qiao F, Binkowski TA, Broughan I, Chen W, Natarajan A, Schiltz GE, Scheidt KA, Anderson WF, Bergan R. Protein Structure Inspired Discovery of a Novel Inducer of Anoikis in Human Melanoma. Cancers (Basel) 2024; 16:3177. [PMID: 39335149 PMCID: PMC11429909 DOI: 10.3390/cancers16183177] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2024] [Revised: 09/11/2024] [Accepted: 09/12/2024] [Indexed: 09/30/2024] Open
Abstract
Drug discovery historically starts with an established function, either that of compounds or proteins. This can hamper discovery of novel therapeutics. As structure determines function, we hypothesized that unique 3D protein structures constitute primary data that can inform novel discovery. Using a computationally intensive physics-based analytical platform operating at supercomputing speeds, we probed a high-resolution protein X-ray crystallographic library developed by us. For each of the eight identified novel 3D structures, we analyzed binding of sixty million compounds. Top-ranking compounds were acquired and screened for efficacy against breast, prostate, colon, or lung cancer, and for toxicity on normal human bone marrow stem cells, both using eight-day colony formation assays. Effective and non-toxic compounds segregated to two pockets. One compound, Dxr2-017, exhibited selective anti-melanoma activity in the NCI-60 cell line screen. In eight-day assays, Dxr2-017 had an IC50 of 12 nM against melanoma cells, while concentrations over 2100-fold higher had minimal stem cell toxicity. Dxr2-017 induced anoikis, a unique form of programmed cell death in need of targeted therapeutics. Our findings demonstrate proof-of-concept that protein structures represent high-value primary data to support the discovery of novel acting therapeutics. This approach is widely applicable.
Collapse
Affiliation(s)
- Fangfang Qiao
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | | | - Irene Broughan
- Department of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Weining Chen
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Amarnath Natarajan
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Gary E Schiltz
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Karl A Scheidt
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Wayne F Anderson
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL 60611, USA
| | - Raymond Bergan
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68105, USA
| |
Collapse
|
2
|
Qiao F, Binknowski TA, Broughan I, Chen W, Natarajan A, Schiltz GE, Scheidt KA, Anderson WF, Bergan R. Protein Structure Inspired Drug Discovery. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2024.05.17.594634. [PMID: 38826221 PMCID: PMC11142055 DOI: 10.1101/2024.05.17.594634] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2024]
Abstract
Drug discovery starts with known function, either of a compound or a protein, in-turn prompting investigations to probe 3D structure of the compound-protein interface. As protein structure determines function, we hypothesized that unique 3D structural motifs represent primary information denoting unique function that can drive discovery of novel agents. Using a physics-based protein structure analysis platform developed by us, designed to conduct computationally intensive analysis at supercomputing speeds, we probed a high-resolution protein x-ray crystallographic library developed by us. We selected 3D structural motifs whose function was not otherwise established, that offered environments supporting binding of drug-like chemicals and were present on proteins that were not established therapeutic targets. For each of eight potential binding pockets on six different proteins we accessed a 60 million compound library and used our analysis platform to evaluate binding. Using eight-day colony formation assays acquired compounds were screened for efficacy against human breast, prostate, colon and lung cancer cells and toxicity against human bone marrow stem cells. Compounds selectively inhibiting cancer growth segregated to two pockets on separate proteins. The compound, Dxr2-017, exhibited selective activity against human melanoma cells in the NCI-60 cell line screen, had an IC50 of 19 nM against human melanoma M14 cells in our eight-day assay, while over 2100-fold higher concentrations inhibited stem cells by less than 30%. We show that Dxr2-017 induces anoikis, a unique form of programmed cell death in need of targeted therapeutics. The predicted target protein for Dxr2-017 is expressed in bacteria, not in humans. This supports our strategy of focusing on unique 3D structural motifs. It is known that functionally important 3D structures are evolutionarily conserved. Here we demonstrate proof-of-concept that protein structure represents high value primary data to support discovery of novel therapeutics. This approach is widely applicable.
Collapse
Affiliation(s)
- Fangfang Qiao
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | | | - Irene Broughan
- Department of Medicine, Northwestern University, Chicago, IL 60611, USA
| | - Weining Chen
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Amarnath Natarajan
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| | - Gary E. Schiltz
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Karl A. Scheidt
- Department of Chemistry, Northwestern University, Evanston, IL 60208, USA
| | - Wayne F. Anderson
- Department of Biochemistry and Molecular Genetics, Northwestern University, Chicago, IL 60611, USA
| | - Raymond Bergan
- Eppley Institute for Research in Cancer, University of Nebraska Medical Center, Omaha, NE 68105, USA
| |
Collapse
|
3
|
Cheong CSY, Khan SU, Ahmed N, Narayanan K. Identification of dual active sites in Caenorhabditis elegans GANA-1 protein: an ortholog of the human α-GAL a and α-NAGA enzymes. J Biomol Struct Dyn 2022:1-16. [PMID: 35694994 DOI: 10.1080/07391102.2022.2084162] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
Abstract
Fabry disease (FD) is caused by a defective α-galactosidase A (α-GAL A) enzyme responsible for breaking down globotriaosylceramide (Gb3). To develop affordable therapeutics, more effort is needed to obtain insights into the underlying mechanism of FD and understanding human α-GAL A structure and function in related animal models. We adopted C. elegans as a model to elucidate the sequence and 3D structure of its GANA-1 enzyme and compared it to human α-GAL A. We constructed GANA-1 3D structure by homology modelling and validated the quality of the predicted GANA-1 structure, followed by computational docking of human ligands. The GANA-1 protein shared sequence similarities up to 42.1% with the human α-GAL A in silico and had dual active sites. GANA-1 homology modelling showed that 11 out of 13 amino acids in the first active site of GANA-1 protein overlapped with the human α-GAL A active site, indicating the prospect for substrate cross-reaction. Computational molecular docking using human ligands like Gb3 (first pocket), 4-nitrophenyl-α-D-galactopyranoside (second pocket), α-galactose (second pocket), and N-acetyl-D-galactosamine (second pocket) showed negative binding energy. This revealed that the ligands were able to bind within both GANA-1 active sites, mimicking the human α-GAL A and α-NAGA enzymes. We identified human compounds with adequate docking scores, predicting robust interactions with the GANA-1 active site. Our data suggested that the C. elegans GANA-1 enzyme may possess structural and functional similarities to human α-GAL A, including an intrinsic capability to metabolize Gb3 deposits.Communicated by Ramaswamy H. Sarma.
Collapse
Affiliation(s)
- Clerance Su Yee Cheong
- School of Science, Monash University Malaysia, Bandar Sunway, Selangor Darul Ehsan, Malaysia
| | - Shafi Ullah Khan
- School of Pharmacy, Monash University Malaysia, Bandar Sunway, Selangor Darul Ehsan, Malaysia.,Department of Pharmacy, Abasyn University, Peshawar, Khyber Pakhtunkhwa, Pakistan.,Product & Process Innovation Department, Qarshi Brands (Pvt) Ltd, District Haripur, Khyber Pakhtunkhwa, Pakistan
| | - Nafees Ahmed
- School of Pharmacy, Monash University Malaysia, Bandar Sunway, Selangor Darul Ehsan, Malaysia.,Tropical Medicine and Biology Multidisciplinary Platform, Monash University Malaysia, Bandar Sunway, Selangor Darul Ehsan, Malaysia
| | - Kumaran Narayanan
- School of Science, Monash University Malaysia, Bandar Sunway, Selangor Darul Ehsan, Malaysia
| |
Collapse
|
4
|
Manuchehrfar F, Li H, Tian W, Ma A, Liang J. Exact Topology of the Dynamic Probability Surface of an Activated Process by Persistent Homology. J Phys Chem B 2021; 125:4667-4680. [PMID: 33938737 DOI: 10.1021/acs.jpcb.1c00904] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
To gain insight into the reaction mechanism of activated processes, we introduce an exact approach for quantifying the topology of high-dimensional probability surfaces of the underlying dynamic processes. Instead of Morse indexes, we study the homology groups of a sequence of superlevel sets of the probability surface over high-dimensional configuration spaces using persistent homology. For alanine-dipeptide isomerization, a prototype of activated processes, we identify locations of probability peaks and connecting ridges, along with measures of their global prominence. Instead of a saddle point, the transition state ensemble (TSE) of conformations is at the most prominent probability peak after reactants/products, when proper reaction coordinates are included. Intuition-based models, even those exhibiting a double-well, fail to capture the dynamics of the activated process. Peak occurrence, prominence, and locations can be distorted upon subspace projection. While principal component analysis accounts for conformational variance, it inflates the complexity of the surface topology and destroys the dynamic properties of the topological features. In contrast, TSE emerges naturally as the most prominent peak beyond the reactant/product basins, when projected to a subspace of minimum dimension containing the reaction coordinates. Our approach is general and can be applied to investigate the topology of high-dimensional probability surfaces of other activated processes.
Collapse
Affiliation(s)
- Farid Manuchehrfar
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Huiyu Li
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Wei Tian
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Ao Ma
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| | - Jie Liang
- Center for Bioinformatics and Quantiative Biology and Department of Bioengneering, University of Illinois at Chicago, Chicago, Illinois 60607, United States
| |
Collapse
|
5
|
Georgiev GD, Dodd KF, Chen BY. Precise parallel volumetric comparison of molecular surfaces and electrostatic isopotentials. Algorithms Mol Biol 2020; 15:11. [PMID: 32489400 PMCID: PMC7247173 DOI: 10.1186/s13015-020-00168-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2019] [Accepted: 04/15/2020] [Indexed: 11/10/2022] Open
Abstract
Geometric comparisons of binding sites and their electrostatic properties can identify subtle variations that select different binding partners and subtle similarities that accommodate similar partners. Because subtle features are central for explaining how proteins achieve specificity, algorithmic efficiency and geometric precision are central to algorithmic design. To address these concerns, this paper presents pClay, the first algorithm to perform parallel and arbitrarily precise comparisons of molecular surfaces and electrostatic isopotentials as geometric solids. pClay was presented at the 2019 Workshop on Algorithms in Bioinformatics (WABI 2019) and is described in expanded detail here, especially with regard to the comparison of electrostatic isopotentials. Earlier methods have generally used parallelism to enhance computational throughput, pClay is the first algorithm to use parallelism to make arbitrarily high precision comparisons practical. It is also the first method to demonstrate that high precision comparisons of geometric solids can yield more precise structural inferences than algorithms that use existing standards of precision. One advantage of added precision is that statistical models can be trained with more accurate data. Using structural data from an existing method, a model of steric variations between binding cavities can overlook 53% of authentic steric influences on specificity, whereas a model trained with data from pClay overlooks none. Our results also demonstrate the parallel performance of pClay on both workstation CPUs and a 61-core Xeon Phi. While slower on one core, additional processor cores rapidly outpaced single core performance and existing methods. Based on these results, it is clear that pClay has applications in the automatic explanation of binding mechanisms and in the rational design of protein binding preferences.
Collapse
Affiliation(s)
- Georgi D. Georgiev
- Department of Computer Science and Engineering, Lehigh University, 113 Research Drive, Bethlehem, PA USA
| | - Kevin F. Dodd
- Department of Computer Science and Engineering, Lehigh University, 113 Research Drive, Bethlehem, PA USA
| | - Brian Y. Chen
- Department of Computer Science and Engineering, Lehigh University, 113 Research Drive, Bethlehem, PA USA
| |
Collapse
|
6
|
Zhang Y, Sui X, Stagg S, Zhang J. FTIP: an accurate and efficient method for global protein surface comparison. Bioinformatics 2020; 36:3056-3063. [PMID: 32022843 DOI: 10.1093/bioinformatics/btaa076] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2019] [Revised: 01/16/2020] [Accepted: 01/28/2020] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Global protein surface comparison (GPSC) studies have been limited compared to other research works on protein structure alignment/comparison due to lack of real applications associated with GPSC. However, the technology advances in cryo-electron tomography (CET) have made methods to identify proteins from their surface shapes extremely useful. RESULTS In this study, we developed a new method called Farthest point sampling (FPS)-enhanced Triangulation-based Iterative-closest-Point (ICP) (FTIP) for GPSC. We applied it to protein classification using only surface shape information. Our method first extracts a set of feature points from protein surfaces using FPS and then uses a triangulation-based efficient ICP algorithm to align the feature points of the two proteins to be compared. Tested on a benchmark dataset with 2329 proteins using nearest-neighbor classification, FTIP outperformed the state-of-the-art method for GPSC based on 3D Zernike descriptors. Using real and simulated cryo-EM data, we show that FTIP could be applied in the future to address problems in protein identification in CET experiments. AVAILABILITY AND IMPLEMENTATION Programs/scripts we developed/used in the study are available at http://ani.stat.fsu.edu/∼yuan/index.fld/FTIP.tar.bz2. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | | | - Scott Stagg
- Department of Chemistry, Florida State University, Tallahassee, FL 32306, USA
| | | |
Collapse
|
7
|
Zhou J, Wu JH. Binding-Site Match Maker (BSMM): A Computational Method for the Design of Multi-Target Ligands. Molecules 2020; 25:molecules25081821. [PMID: 32316104 PMCID: PMC7221819 DOI: 10.3390/molecules25081821] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Revised: 03/30/2020] [Accepted: 04/01/2020] [Indexed: 11/20/2022] Open
Abstract
Multi-target ligand strategies provide a valuable method of drug design. However, to develop a multi-target drug with the desired profile remains a challenge. Herein, we developed a computational method binding-site match maker (BSMM) for the design of multi-target ligands based on binding site matching. BSMM was built based on geometric hashing algorithms and the representation of a binding-site with physicochemical (PC) points. The BSMM software was used to detect proteins with similar binding sites or subsites. In particular, BSMM is independent of protein global folds and sequences and is therefore applicable to the matching of any binding sites. The similar sites between protein pairs with low homology and/or different folds are generally not obvious to the visual inspection. The detection of such similar binding sites by BSMM could be of great value for the design of multi-target ligands.
Collapse
Affiliation(s)
- Jinming Zhou
- Key Laboratory of the Ministry of Education for Advanced Catalysis Materials, Department of Chemistry, Zhejiang Normal University, 688 Yingbin Road, Jinhua 321004, China
- Drug Discovery and Innovation Center, College of Chemistry and Life Sciences, Zhejiang Normal University, 688 Yingbin Road, Jinhua 321004, China
- Correspondence: (J.Z.); (J.H.W.); Tel.: (514) 340-8222 (J.H.W.); Fax: (514) 340-8717 (J.H.W.)
| | - Jian Hui Wu
- Segal Cancer Center, Montreal, QC H3T 1E2, Canada
- Lady Davis Institute for Medical Research, Sir Mortimer B. Davis-Jewish General Hospital, McGill University, 3755 Cote-Ste-Catherine, Rd., Montreal, QC H3T 1E2, Canada
- Department of Oncology, McGill University, 3755 Cote-Ste-Catherine, Rd., Montreal, QC H3T 1E2, Canada
- Correspondence: (J.Z.); (J.H.W.); Tel.: (514) 340-8222 (J.H.W.); Fax: (514) 340-8717 (J.H.W.)
| |
Collapse
|
8
|
Xu L, Gordon R, Farmer R, Pattanayak A, Binkowski A, Huang X, Avram M, Krishna S, Voll E, Pavese J, Chavez J, Bruce J, Mazar A, Nibbs A, Anderson W, Li L, Jovanovic B, Pruell S, Valsecchi M, Francia G, Betori R, Scheidt K, Bergan R. Precision therapeutic targeting of human cancer cell motility. Nat Commun 2018; 9:2454. [PMID: 29934502 PMCID: PMC6014988 DOI: 10.1038/s41467-018-04465-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2017] [Accepted: 05/02/2018] [Indexed: 12/12/2022] Open
Abstract
Increased cancer cell motility constitutes a root cause of end organ destruction and mortality, but its complex regulation represents a barrier to precision targeting. We use the unique characteristics of small molecules to probe and selectively modulate cell motility. By coupling efficient chemical synthesis routes to multiple upfront in parallel phenotypic screens, we identify that KBU2046 inhibits cell motility and cell invasion in vitro. Across three different murine models of human prostate and breast cancer, KBU2046 inhibits metastasis, decreases bone destruction, and prolongs survival at nanomolar blood concentrations after oral administration. Comprehensive molecular, cellular and systemic-level assays all support a high level of selectivity. KBU2046 binds chaperone heterocomplexes, selectively alters binding of client proteins that regulate motility, and lacks all the hallmarks of classical chaperone inhibitors, including toxicity. We identify a unique cell motility regulatory mechanism and synthesize a targeted therapeutic, providing a platform to pursue studies in humans. In this study, the authors identify and validate a halogen-substituted isoflavanone able to inhibit prostate cancer cell motility, invasion and metastasis in vitro and in vivo. They demonstrate its ability to selectively inhibit activation of client proteins that stimulate cell motility.
Collapse
Affiliation(s)
- Li Xu
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA.,Department of Gastroenterology, Xiang'an Hospital of Xiamen University, Fujian, 361101, Xiamen, China
| | - Ryan Gordon
- Division of Hematology/Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Rebecca Farmer
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Abhinandan Pattanayak
- Division of Hematology/Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, 97239, USA
| | - Andrew Binkowski
- Department of Computer Science, University of Chicago, Chicago, IL, 60637, USA
| | - Xiaoke Huang
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Michael Avram
- Department of Anesthesiology, Northwestern University, Chicago, IL, 60611, USA
| | - Sankar Krishna
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Eric Voll
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Janet Pavese
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Juan Chavez
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - James Bruce
- Department of Genome Sciences, University of Washington, Seattle, WA, 98195, USA
| | - Andrew Mazar
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Antoinette Nibbs
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Wayne Anderson
- Department of Molecular Pharmacology and Biological Chemistry, Northwestern University, Chicago, IL, 60611, USA
| | - Lin Li
- Department of Pathology, Northwestern University, Chicago, IL, 60611, USA
| | - Borko Jovanovic
- Department of Preventive Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Sean Pruell
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Matias Valsecchi
- Department of Medicine, Northwestern University, Chicago, IL, 60611, USA
| | - Giulio Francia
- Border Biomedical Research Center, University of Texas at El Paso, El Paso, TX, 79968, USA
| | - Rick Betori
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Karl Scheidt
- Department of Chemistry, Northwestern University, Evanston, IL, 60208, USA
| | - Raymond Bergan
- Division of Hematology/Oncology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR, 97239, USA.
| |
Collapse
|
9
|
Axenopoulos A, Rafailidis D, Papadopoulos G, Houstis EN, Daras P. Similarity Search of Flexible 3D Molecules Combining Local and Global Shape Descriptors. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:954-970. [PMID: 26561479 DOI: 10.1109/tcbb.2015.2498553] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
In this paper, a framework for shape-based similarity search of 3D molecular structures is presented. The proposed framework exploits simultaneously the discriminative capabilities of a global, a local, and a hybrid local-global shape feature to produce a geometric descriptor that achieves higher retrieval accuracy than each feature does separately. Global and hybrid features are extracted using pairwise computations of diffusion distances between the points of the molecular surface, while the local feature is based on accumulating pairwise relations among oriented surface points into local histograms. The local features are integrated into a global descriptor vector using the bag-of-features approach. Due to the intrinsic property of its constituting shape features to be invariant to articulations of the 3D objects, the framework is appropriate for similarity search of flexible 3D molecules, while at the same time it is also accurate in retrieving rigid 3D molecules. The proposed framework is evaluated in flexible and rigid shape matching of 3D protein structures as well as in shape-based virtual screening of large ligand databases with quite promising results.
Collapse
|
10
|
Ehrt C, Brinkjost T, Koch O. Impact of Binding Site Comparisons on Medicinal Chemistry and Rational Molecular Design. J Med Chem 2016; 59:4121-51. [PMID: 27046190 DOI: 10.1021/acs.jmedchem.6b00078] [Citation(s) in RCA: 66] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
Modern rational drug design not only deals with the search for ligands binding to interesting and promising validated targets but also aims to identify the function and ligands of yet uncharacterized proteins having impact on different diseases. Additionally, it contributes to the design of inhibitors with distinct selectivity patterns and the prediction of possible off-target effects. The identification of similarities between binding sites of various proteins is a useful approach to cope with those challenges. The main scope of this perspective is to describe applications of different protein binding site comparison approaches to outline their applicability and impact on molecular design. The article deals with various substantial application domains and provides some outstanding examples to show how various binding site comparison methods can be applied to promote in silico drug design workflows. In addition, we will also briefly introduce the fundamental principles of different protein binding site comparison methods.
Collapse
Affiliation(s)
- Christiane Ehrt
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| | - Tobias Brinkjost
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany.,Department of Computer Science, TU Dortmund University , Otto-Hahn-Straße 14, 44224 Dortmund, Germany
| | - Oliver Koch
- Faculty of Chemistry and Chemical Biology, TU Dortmund University , Otto-Hahn-Straße 6, 44227 Dortmund, Germany
| |
Collapse
|
11
|
Woods KN, Pfeffer J. Using THz Spectroscopy, Evolutionary Network Analysis Methods, and MD Simulation to Map the Evolution of Allosteric Communication Pathways in c-Type Lysozymes. Mol Biol Evol 2016; 33:40-61. [PMID: 26337549 PMCID: PMC4693973 DOI: 10.1093/molbev/msv178] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
It is now widely accepted that protein function is intimately tied with the navigation of energy landscapes. In this framework, a protein sequence is not described by a distinct structure but rather by an ensemble of conformations. And it is through this ensemble that evolution is able to modify a protein's function by altering its landscape. Hence, the evolution of protein functions involves selective pressures that adjust the sampling of the conformational states. In this work, we focus on elucidating the evolutionary pathway that shaped the function of individual proteins that make-up the mammalian c-type lysozyme subfamily. Using both experimental and computational methods, we map out specific intermolecular interactions that direct the sampling of conformational states and accordingly, also underlie shifts in the landscape that are directly connected with the formation of novel protein functions. By contrasting three representative proteins in the family we identify molecular mechanisms that are associated with the selectivity of enhanced antimicrobial properties and consequently, divergent protein function. Namely, we link the extent of localized fluctuations involving the loop separating helices A and B with shifts in the equilibrium of the ensemble of conformational states that mediate interdomain coupling and concurrently moderate substrate binding affinity. This work reveals unique insights into the molecular level mechanisms that promote the progression of interactions that connect the immune response to infection with the nutritional properties of lactation, while also providing a deeper understanding about how evolving energy landscapes may define present-day protein function.
Collapse
|
12
|
Korkuć P, Walther D. Physicochemical characteristics of structurally determined metabolite-protein and drug-protein binding events with respect to binding specificity. Front Mol Biosci 2015; 2:51. [PMID: 26442281 PMCID: PMC4569973 DOI: 10.3389/fmolb.2015.00051] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2015] [Accepted: 08/25/2015] [Indexed: 12/19/2022] Open
Abstract
To better understand and ultimately predict both the metabolic activities as well as the signaling functions of metabolites, a detailed understanding of the physical interactions of metabolites with proteins is highly desirable. Focusing in particular on protein binding specificity vs. promiscuity, we performed a comprehensive analysis of the physicochemical properties of compound-protein binding events as reported in the Protein Data Bank (PDB). We compared the molecular and structural characteristics obtained for metabolites to those of the well-studied interactions of drug compounds with proteins. Promiscuously binding metabolites and drugs are characterized by low molecular weight and high structural flexibility. Unlike reported for drug compounds, low rather than high hydrophobicity appears associated, albeit weakly, with promiscuous binding for the metabolite set investigated in this study. Across several physicochemical properties, drug compounds exhibit characteristic binding propensities that are distinguishable from those associated with metabolites. Prediction of target diversity and compound promiscuity using physicochemical properties was possible at modest accuracy levels only, but was consistently better for drugs than for metabolites. Compound properties capturing structural flexibility and hydrogen-bond formation descriptors proved most informative in PLS-based prediction models. With regard to diversity of enzymatic activities of the respective metabolite target enzymes, the metabolites benzylsuccinate, hypoxanthine, trimethylamine N-oxide, oleoylglycerol, and resorcinol showed very narrow process involvement, while glycine, imidazole, tryptophan, succinate, and glutathione were identified to possess broad enzymatic reaction scopes. Promiscuous metabolites were found to mainly serve as general energy currency compounds, but were identified to also be involved in signaling processes and to appear in diverse organismal systems (digestive and nervous system) suggesting specific molecular and physiological roles of promiscuous metabolites.
Collapse
Affiliation(s)
- Paula Korkuć
- Max Planck Institute for Molecular Plant Physiology Potsdam-Golm, Germany
| | - Dirk Walther
- Max Planck Institute for Molecular Plant Physiology Potsdam-Golm, Germany
| |
Collapse
|
13
|
Krotzky T, Grunwald C, Egerland U, Klebe G. Large-scale mining for similar protein binding pockets: with RAPMAD retrieval on the fly becomes real. J Chem Inf Model 2014; 55:165-79. [PMID: 25474400 DOI: 10.1021/ci5005898] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Determination of structural similarities between protein binding pockets is an important challenge in in silico drug design. It can help to understand selectivity considerations, predict unexpected ligand cross-reactivity, and support the putative annotation of function to orphan proteins. To this end, Cavbase was developed as a tool for the automated detection, storage, and classification of putative protein binding sites. In this context, binding sites are characterized as sets of pseudocenters, which denote surface-exposed physicochemical properties, and can be used to enable mutual binding site comparisons. However, these comparisons tend to be computationally very demanding and often lead to very slow computations of the similarity measures. In this study, we propose RAPMAD (RApid Pocket MAtching using Distances), a new evaluation formalism for Cavbase entries that allows for ultrafast similarity comparisons. Protein binding sites are represented by sets of distance histograms that are both generated and compared with linear complexity. Attaining a speed of more than 20 000 comparisons per second, screenings across large data sets and even entire databases become easily feasible. We demonstrate the discriminative power and the short runtime by performing several classification and retrieval experiments. RAPMAD attains better success rates than the comparison formalism originally implemented into Cavbase or several alternative approaches developed in recent time, while requiring only a fraction of their runtime. The pratical use of our method is finally proven by a successful prospective virtual screening study that aims for the identification of novel inhibitors of the NMDA receptor.
Collapse
Affiliation(s)
- Timo Krotzky
- Department of Pharmaceutical Chemistry, Philipps-Universität Marburg , Marbacher Weg 6-10, 35032 Marburg, Germany
| | | | | | | |
Collapse
|
14
|
Krotzky T, Rickmeyer T, Fober T, Klebe G. Extraction of protein binding pockets in close neighborhood of bound ligands makes comparisons simple due to inherent shape similarity. J Chem Inf Model 2014; 54:3229-37. [PMID: 25345905 DOI: 10.1021/ci500553a] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Methods for comparing protein binding sites are frequently validated on data sets of pockets that were obtained simply by extracting the protein area next to the bound ligands. With this strategy, any unoccupied pocket will remain unconsidered. Furthermore, a large amount of ligand-biased intrinsic shape information is predefined, inclining the subsequent comparisons as rather trivial even in data sets that hardly contain redundancies in sequence information. In this study, we present the results of a very simplistic and shape-biased comparison approach, which stress that unrestricted cavity extraction is essential to enable unexpected cross-reactivity predictions among proteins and function annotations of orphan proteins.
Collapse
Affiliation(s)
- Timo Krotzky
- Institute of Pharmaceutical Chemistry, University of Marburg , Marbacher Weg 6-10, 35032 Marburg, Germany
| | | | | | | |
Collapse
|
15
|
Binkowski TA, Jiang W, Roux B, Anderson WF, Joachimiak A. Virtual high-throughput ligand screening. Methods Mol Biol 2014; 1140:251-61. [PMID: 24590723 DOI: 10.1007/978-1-4939-0354-2_19] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/21/2023]
Abstract
In Structural Genomics projects, virtual high-throughput ligand screening can be utilized to provide important functional details for newly determined protein structures. Using a variety of publicly available software tools, it is possible to computationally model, predict, and evaluate how different ligands interact with a given protein. At the Center for Structural Genomics of Infectious Diseases (CSGID) a series of protein analysis, docking and molecular dynamics software is scripted into a single hierarchical pipeline allowing for an exhaustive investigation of protein-ligand interactions. The ability to conduct accurate computational predictions of protein-ligand binding is a vital component in improving both the efficiency and economics of drug discovery. Computational simulations can minimize experimental efforts, the slowest and most cost prohibitive aspect of identifying new therapeutics.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Center for Structural Genomics of Infectious Diseases, Computation Institute, University of Chicago, Chicago, IL, USA,
| | | | | | | | | |
Collapse
|
16
|
Krotzky T, Fober T, Hüllermeier E, Klebe G. Extended Graph-Based Models for Enhanced Similarity Search in Cavbase. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:878-890. [PMID: 26356860 DOI: 10.1109/tcbb.2014.2325020] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
To calculate similarities between molecular structures, measures based on the maximum common subgraph are frequently applied. For the comparison of protein binding sites, these measures are not fully appropriate since graphs representing binding sites on a detailed atomic level tend to get very large. In combination with an NP-hard problem, a large graph leads to a computationally demanding task. Therefore, for the comparison of binding sites, a less detailed coarse graph model is used building upon so-called pseudocenters. Consistently, a loss of structural data is caused since many atoms are discarded and no information about the shape of the binding site is considered. This is usually resolved by performing subsequent calculations based on additional information. These steps are usually quite expensive, making the whole approach very slow. The main drawback of a graph-based model solely based on pseudocenters, however, is the loss of information about the shape of the protein surface. In this study, we propose a novel and efficient modeling formalism that does not increase the size of the graph model compared to the original approach, but leads to graphs containing considerably more information assigned to the nodes. More specifically, additional descriptors considering surface characteristics are extracted from the local surface and attributed to the pseudocenters stored in Cavbase. These properties are evaluated as additional node labels, which lead to a gain of information and allow for much faster but still very accurate comparisons between different structures.
Collapse
|
17
|
Chen BY. VASP-E: specificity annotation with a volumetric analysis of electrostatic isopotentials. PLoS Comput Biol 2014; 10:e1003792. [PMID: 25166865 PMCID: PMC4148194 DOI: 10.1371/journal.pcbi.1003792] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Accepted: 06/17/2014] [Indexed: 12/01/2022] Open
Abstract
Algorithms for comparing protein structure are frequently used for function annotation. By searching for subtle similarities among very different proteins, these algorithms can identify remote homologs with similar biological functions. In contrast, few comparison algorithms focus on specificity annotation, where the identification of subtle differences among very similar proteins can assist in finding small structural variations that create differences in binding specificity. Few specificity annotation methods consider electrostatic fields, which play a critical role in molecular recognition. To fill this gap, this paper describes VASP-E (Volumetric Analysis of Surface Properties with Electrostatics), a novel volumetric comparison tool based on the electrostatic comparison of protein-ligand and protein-protein binding sites. VASP-E exploits the central observation that three dimensional solids can be used to fully represent and compare both electrostatic isopotentials and molecular surfaces. With this integrated representation, VASP-E is able to dissect the electrostatic environments of protein-ligand and protein-protein binding interfaces, identifying individual amino acids that have an electrostatic influence on binding specificity. VASP-E was used to examine a nonredundant subset of the serine and cysteine proteases as well as the barnase-barstar and Rap1a-raf complexes. Based on amino acids established by various experimental studies to have an electrostatic influence on binding specificity, VASP-E identified electrostatically influential amino acids with 100% precision and 83.3% recall. We also show that VASP-E can accurately classify closely related ligand binding cavities into groups with different binding preferences. These results suggest that VASP-E should prove a useful tool for the characterization of specific binding and the engineering of binding preferences in proteins. Proteins, the ubiquitous worker molecules of the cell, are a diverse class of molecules that perform very specific tasks. Understanding how proteins achieve specificity is a critical step towards understanding biological systems and a key prerequisite for rationally engineering new proteins. To examine electrostatic influences on specificity in proteins, this paper presents VASP-E, a software tool that generates solid representations of the electrostatic potential fields that surround proteins. VASP-E compares solids with constructive solid geometry, a class of techniques developed first for modeling complex machine parts. We observed that solid representations could quantify the degree of charge complementarity in protein-protein interactions and identify key residues that strengthen or weaken them. VASP-E correctly identified amino acids with established experimental influences on protein-protein binding specificity. We also observed that solid representations of electrostatic fields could identify electrostatic conservations and variations that relate to similarities and differences in binding specificity between proteins and small molecules.
Collapse
Affiliation(s)
- Brian Y. Chen
- Department of Computer Science and Engineering, P.C. Rossin College of Engineering and Applied Sciences, Lehigh University, Bethlehem, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
18
|
Woods KN. Using THz time-scale infrared spectroscopy to examine the role of collective, thermal fluctuations in the formation of myoglobin allosteric communication pathways and ligand specificity. SOFT MATTER 2014; 10:4387-4402. [PMID: 24801988 DOI: 10.1039/c3sm53229a] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
In this investigation we use THz time-scale spectroscopy to conduct an initial set of studies on myoglobin with the aim of providing further insight into the global, collective thermal fluctuations in the protein that have been hypothesized to play a prominent role in the dynamic formation of transient ligand channels as well as in shaping the molecular level basis for ligand discrimination. Using the two ligands O2 and CO, we have determined that the perturbation from the heme-ligand complex has a strong influence on the characteristics of the myoglobin collective dynamics that are excited upon binding. Further, the differences detected in the collective protein motions in Mb-O2 compared with those in Mb-CO appear to be intimately tied with the pathways of long-range allosteric communication in the protein, which ultimately determine the trajectories selected by the respective ligands on the path to and from the heme-binding cavity.
Collapse
Affiliation(s)
- K N Woods
- Physics Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.
| |
Collapse
|
19
|
APSLAP: an adaptive boosting technique for predicting subcellular localization of apoptosis protein. Acta Biotheor 2013; 61:481-97. [PMID: 23982307 DOI: 10.1007/s10441-013-9197-1] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/30/2013] [Accepted: 08/16/2013] [Indexed: 01/09/2023]
Abstract
Apoptotic proteins play key roles in understanding the mechanism of programmed cell death. Knowledge about the subcellular localization of apoptotic protein is constructive in understanding the mechanism of programmed cell death, determining the functional characterization of the protein, screening candidates in drug design, and selecting protein for relevant studies. It is also proclaimed that the information required for determining the subcellular localization of protein resides in their corresponding amino acid sequence. In this work, a new biological feature, class pattern frequency of physiochemical descriptor, was effectively used in accordance with the amino acid composition, protein similarity measure, CTD (composition, translation, and distribution) of physiochemical descriptors, and sequence similarity to predict the subcellular localization of apoptosis protein. AdaBoost with the weak learner as Random-Forest was designed for the five modules and prediction is made based on the weighted voting system. Bench mark dataset of 317 apoptosis proteins were subjected to prediction by our system and the accuracy was found to be 100.0 and 92.4 %, and 90.1 % for self-consistency test, jack-knife test, and tenfold cross validation test respectively, which is 0.9 % higher than that of other existing methods. Beside this, the independent data (N151 and ZW98) set prediction resulted in the accuracy of 90.7 and 87.7 %, respectively. These results show that the protein feature represented by a combined feature vector along with AdaBoost algorithm holds well in effective prediction of subcellular localization of apoptosis proteins. The user friendly web interface "APSLAP" has been constructed, which is freely available at http://apslap.bicpu.edu.in and it is anticipated that this tool will play a significant role in determining the specific role of apoptosis proteins with reliability.
Collapse
|
20
|
Jalencas X, Mestres J. Identification of Similar Binding Sites to Detect Distant Polypharmacology. Mol Inform 2013; 32:976-90. [PMID: 27481143 DOI: 10.1002/minf.201300082] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2013] [Accepted: 07/29/2013] [Indexed: 01/19/2023]
Abstract
The ability of small molecules to interact with multiple proteins is referred to as polypharmacology. This property is often linked to the therapeutic action of drugs but it is known also to be responsible for many of their side effects. Because of its importance, the development of computational methods that can predict drug polypharmacology has become an important line of research that led recently to the identification of many novel targets for known drugs. Nowadays, the majority of these methods are based on measuring the similarity of a query molecule against the hundreds of thousands of molecules for which pharmacological data on thousands of proteins are available in public sources. However, similarity-based methods are inherently biased by the chemical coverage offered by the active molecules present in those public repositories, which limits significantly their capacity to predict interactions with proteins structurally and functionally unrelated to any of the already known targets for drugs. It is in this respect that structure-based methods aiming at identifying similar binding sites may offer an alternative complementary means to ligand-based methods for detecting distant polypharmacology. The different existing approaches to binding site detection, representation, comparison, and fragmentation are reviewed and recent successful applications presented.
Collapse
Affiliation(s)
- Xavier Jalencas
- Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Research Institute & University Pompeu Fabra, Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain fax: +34 93 3160550
| | - Jordi Mestres
- Systems Pharmacology, Research Program on Biomedical Informatics (GRIB), IMIM Hospital del Mar Research Institute & University Pompeu Fabra, Parc de Recerca Biomèdica, Doctor Aiguader 88, 08003 Barcelona, Catalonia, Spain fax: +34 93 3160550.
| |
Collapse
|
21
|
Murakami Y, Kinoshita K, Kinjo AR, Nakamura H. Exhaustive comparison and classification of ligand-binding surfaces in proteins. Protein Sci 2013; 22:1379-91. [PMID: 23934772 PMCID: PMC3795496 DOI: 10.1002/pro.2329] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Revised: 07/29/2013] [Accepted: 08/05/2013] [Indexed: 12/03/2022]
Abstract
Many proteins function by interacting with other small molecules (ligands). Identification of ligand-binding sites (LBS) in proteins can therefore help to infer their molecular functions. A comprehensive comparison among local structures of LBSs was previously performed, in order to understand their relationships and to classify their structural motifs. However, similar exhaustive comparison among local surfaces of LBSs (patches) has never been performed, due to computational complexity. To enhance our understanding of LBSs, it is worth performing such comparisons among patches and classifying them based on similarities of their surface configurations and electrostatic potentials. In this study, we first developed a rapid method to compare two patches. We then clustered patches corresponding to the same PDB chemical component identifier for a ligand, and selected a representative patch from each cluster. We subsequently exhaustively as compared the representative patches and clustered them using similarity score, PatSim. Finally, the resultant PatSim scores were compared with similarities of atomic structures of the LBSs and those of the ligand-binding protein sequences and functions. Consequently, we classified the patches into ∼2000 well-characterized clusters. We found that about 63% of these clusters are used in identical protein folds, although about 25% of the clusters are conserved in distantly related proteins and even in proteins with cross-fold similarity. Furthermore, we showed that patches with higher PatSim score have potential to be involved in similar biological processes.
Collapse
Affiliation(s)
- Yoichi Murakami
- Graduate School of Information Sciences, Tohoku University, 6-3-09 Aramaki-aza-aoba, Aoba-ku, Sendai, Miyagi, 982-0036, Japan
| | | | | | | |
Collapse
|
22
|
Chemogenomics in drug discovery: computational methods based on the comparison of binding sites. Future Med Chem 2013; 4:1971-9. [PMID: 23088277 DOI: 10.4155/fmc.12.147] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Novel computational methods for understanding relationships between ligands and all possible biological targets have emerged in recent years. Proteins are connected to each other based on the similarity of their ligands or based on the similarity of their binding sites. The assumption is that compounds sharing chemical similarity should share targets and that targets with a similar binding site should also share ligands. A large number of computational techniques have been developed to assess ligand and binding site similarity, which can be used to make quantitative predictions of the most probable biological target of a given compound. This review covers the recent advances in new computational methods for relating biological targets based on the similarity of their binding sites. Binding site comparisons are used for the prediction of their most likely ligands, their possible cross reactivity and selectivity. These comparisons can also be used to infer the function of novel uncharacterized proteins.
Collapse
|
23
|
Chen BY, Bandyopadhyay S. A regionalizable statistical model of intersecting regions in protein-ligand binding cavities. J Bioinform Comput Biol 2012; 10:1242004. [PMID: 22809380 DOI: 10.1142/s0219720012420048] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Finding elements of proteins that influence ligand binding specificity is an essential aspect of research in many fields. To assist in this effort, this paper presents two statistical models, based on the same theoretical foundation, for evaluating structural similarity among binding cavities. The first model specializes in the "unified" comparison of whole cavities, enabling the selection of cavities that are too dissimilar to have similar binding specificity. The second model enables a "regionalized" comparison of cavities within a user-defined region, enabling the selection of cavities that are too dissimilar to bind the same molecular fragments in the given region. We applied these models to analyze the ligand binding cavities of the serine protease and enolase superfamilies. Next, we observed that our unified model correctly separated sets of cavities with identical binding preferences from other sets with varying binding preferences, and that our regionalized model correctly distinguished cavity regions that are too dissimilar to bind similar molecular fragments in the user-defined region. These observations point to applications of statistical modeling that can be used to examine and, more importantly, identify influential structural similarities within binding site structure in order to better detect influences on protein-ligand binding specificity.
Collapse
Affiliation(s)
- Brian Y Chen
- Department of Computer Science and Engineering, Lehigh University, 19 Memorial Drive West, Bethlehem, PA 18015, USA.
| | | |
Collapse
|
24
|
Ellingson L, Zhang J. Protein surface matching by combining local and global geometric information. PLoS One 2012; 7:e40540. [PMID: 22815760 PMCID: PMC3398928 DOI: 10.1371/journal.pone.0040540] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2011] [Accepted: 06/12/2012] [Indexed: 01/01/2023] Open
Abstract
Comparison of the binding sites of proteins is an effective means for predicting protein functions based on their structure information. Despite the importance of this problem and much research in the past, it is still very challenging to predict the binding ligands from the atomic structures of protein binding sites. Here, we designed a new algorithm, TIPSA (Triangulation-based Iterative-closest-point for Protein Surface Alignment), based on the iterative closest point (ICP) algorithm. TIPSA aims to find the maximum number of atoms that can be superposed between two protein binding sites, where any pair of superposed atoms has a distance smaller than a given threshold. The search starts from similar tetrahedra between two binding sites obtained from 3D Delaunay triangulation and uses the Hungarian algorithm to find additional matched atoms. We found that, due to the plasticity of protein binding sites, matching the rigid body of point clouds of protein binding sites is not adequate for satisfactory binding ligand prediction. We further incorporated global geometric information, the radius of gyration of binding site atoms, and used nearest neighbor classification for binding site prediction. Tested on benchmark data, our method achieved a performance comparable to the best methods in the literature, while simultaneously providing the common atom set and atom correspondences.
Collapse
Affiliation(s)
- Leif Ellingson
- Department of Mathematics and Statistics, Texas Tech University, Lubbock, Texas, United States of America
| | - Jinfeng Zhang
- Department of Statistics, Florida State University, Tallahassee, Florida, United States of America
- * E-mail:
| |
Collapse
|
25
|
Fober T, Mernberger M, Klebe G, Hüllermeier E. Fingerprint Kernels for Protein Structure Comparison. Mol Inform 2012; 31:443-52. [PMID: 27477463 DOI: 10.1002/minf.201100149] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2011] [Accepted: 04/03/2012] [Indexed: 11/06/2022]
Abstract
A key task in structural biology is to define a meaningful similarity measure for the comparison of protein structures. Recently, the use of graphs as modeling tools for molecular data has gained increasing importance. In this context, kernel functions have attracted a lot of attention, especially since they allow for the application of a rich repertoire of methods from the field of kernel-based machine learning. However, most of the existing graph kernels have been designed for unlabeled and/or unweighted graphs, although proteins are often more naturally and more exactly represented in terms of node-labeled and edge-weighted graphs. Here we analyze kernel-based protein comparison methods and propose extensions to existing graph kernels to exploit node-labeled and edge-weighted graphs. Moreover, we propose an instance of the substructure fingerprint kernel suitable for the analysis of protein binding sites. By using fuzzy fingerprints, we solve the problem of discontinuity on bin-boundaries arising in the case of labeled graphs.
Collapse
Affiliation(s)
- Thomas Fober
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, 35032 Marburg, Germany.,The first two authors should be regarded as joint First Authors
| | - Marco Mernberger
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, 35032 Marburg, Germany.,Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, 35032 Marburg, Germany.,The first two authors should be regarded as joint First Authors
| | - Gerhard Klebe
- Department of Pharmaceutical Chemistry, Philipps-Universität Marburg, 35032 Marburg, Germany
| | - Eyke Hüllermeier
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, 35032 Marburg, Germany.
| |
Collapse
|
26
|
Chen BY, Bandyopadhyay S. Modeling regionalized volumetric differences in protein-ligand binding cavities. Proteome Sci 2012; 10 Suppl 1:S6. [PMID: 22759583 PMCID: PMC3390949 DOI: 10.1186/1477-5956-10-s1-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023] Open
Abstract
Identifying elements of protein structures that create differences in protein-ligand
binding specificity is an essential method for explaining the molecular mechanisms
underlying preferential binding. In some cases, influential mechanisms can be
visually identified by experts in structural biology, but subtler mechanisms, whose
significance may only be apparent from the analysis of many structures, are harder to
find. To assist this process, we present a geometric algorithm and two statistical
models for identifying significant structural differences in protein-ligand binding
cavities. We demonstrate these methods in an analysis of sequentially nonredundant
structural representatives of the canonical serine proteases and the enolase
superfamily. Here, we observed that statistically significant structural variations
identified experimentally established determinants of specificity. We also observed
that an analysis of individual regions inside cavities can reveal areas where small
differences in shape can correspond to differences in specificity.
Collapse
Affiliation(s)
- Brian Y Chen
- Department of Computer Science and Engineering, Lehigh University, Bethlehem, PA, USA.
| | | |
Collapse
|
27
|
Abstract
We recently proposed to classify proteins by their functional surfaces. Using the structural attributes of functional surfaces, we inferred the pairwise relationships of proteins and constructed an expandable database of protein surface classification (PSC). As the functional surface(s) of a protein is the local region where the protein performs its function, our classification may reflect the functional relationships among proteins. Currently, PSC contains a library of 1974 surface types that include 25 857 functional surfaces identified from 24 170 bound structures. The search tool in PSC empowers users to explore related surfaces that share similar local structures and core functions. Each functional surface is characterized by structural attributes, which are geometric, physicochemical or evolutionary features. The attributes have been normalized as descriptors and integrated to produce a profile for each functional surface in PSC. In addition, binding ligands are recorded for comparisons among homologs. PSC allows users to exploit related binding surfaces to reveal the changes in functionally important residues on homologs that have led to functional divergence during evolution. The substitutions at the key residues of a spatial pattern may determine the functional evolution of a protein. In PSC (http://pocket.uchicago.edu/psc/), a pool of changes in residues on similar functional surfaces is provided.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Ecology and Evolution, University of Chicago 1101 East 57th Street, Chicago, IL 60637, USA.
| | | |
Collapse
|
28
|
Structure-based computational analysis of protein binding sites for function and druggability prediction. J Biotechnol 2012; 159:123-34. [DOI: 10.1016/j.jbiotec.2011.12.005] [Citation(s) in RCA: 81] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2011] [Revised: 12/02/2011] [Accepted: 12/06/2011] [Indexed: 11/19/2022]
|
29
|
Sael L, Chitale M, Kihara D. Structure- and sequence-based function prediction for non-homologous proteins. ACTA ACUST UNITED AC 2012; 13:111-23. [PMID: 22270458 DOI: 10.1007/s10969-012-9126-6] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2011] [Accepted: 01/10/2012] [Indexed: 01/14/2023]
Abstract
The structural genomics projects have been accumulating an increasing number of protein structures, many of which remain functionally unknown. In parallel effort to experimental methods, computational methods are expected to make a significant contribution for functional elucidation of such proteins. However, conventional computational methods that transfer functions from homologous proteins do not help much for these uncharacterized protein structures because they do not have apparent structural or sequence similarity with the known proteins. Here, we briefly review two avenues of computational function prediction methods, i.e. structure-based methods and sequence-based methods. The focus is on our recent developments of local structure-based and sequence-based methods, which can effectively extract function information from distantly related proteins. Two structure-based methods, Pocket-Surfer and Patch-Surfer, identify similar known ligand binding sites for pocket regions in a query protein without using global protein fold similarity information. Two sequence-based methods, protein function prediction and extended similarity group, make use of weakly similar sequences that are conventionally discarded in homology based function annotation. Combined together with experimental methods we hope that computational methods will make leading contribution in functional elucidation of the protein structures.
Collapse
Affiliation(s)
- Lee Sael
- Department of Computer Science, Purdue University, West Lafayette, IN 47907, USA
| | | | | |
Collapse
|
30
|
Classification of protein functional surfaces using structural characteristics. Proc Natl Acad Sci U S A 2012; 109:1170-5. [PMID: 22238424 DOI: 10.1073/pnas.1119684109] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein structure and function are closely related, especially in functional surfaces, which are local spatial regions that perform the biological functions. Also, protein structures tend to evolve more slowly than amino acid sequences. We have therefore developed a method to classify proteins using the structures of functional surfaces; we call it protein surface classification (PSC). PSC may reflect functional relationships among proteins and may detect evolutionary relationships among highly divergent sequences. We focused on the surfaces of ligand-bound regions because they represent well-defined structures. Specifically, we used structural attributes to measure similarities between binding surfaces and constructed a PSC library of ~2,000 binding surface types from the bound forms. Using flavin mononucleotide-binding proteins and glycosidases as examples, we show how the evolutionary position of an uncharacterized protein can be defined and its function inferred from the characterized members of the same surface subtype. We found that proteins with the same enzyme nomenclature may be divided into subtypes and that two proteins in the same CATH (Class, Architecture, Topology, Homologous superfamily) fold may belong to two different surface types. In conclusion, our approach complements the sequence-based and fold-domain classifications and has the advantage of associating the shape of a protein with its biological function. As an expandable library, PSC provides a resource of spatial patterns for studying the evolution of protein structure and function.
Collapse
|
31
|
Singh R. Learning and Prediction of Complex Molecular Structure-Property Relationships. Mach Learn 2012. [DOI: 10.4018/978-1-60960-818-7.ch518] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
The problem of modeling and predicting complex structure-property relationships, such as the absorption, distribution, metabolism, and excretion of putative drug molecules is a fundamental one in contemporary drug discovery. An accurate model can not only be used to predict the behavior of a molecule and understand how structural variations may influence molecular property, but also to identify regions of molecular space that hold promise in context of a specific investigation. However, a variety of factors contribute to the difficulty of constructing robust structure activity models for such complex properties. These include conceptual issues related to how well the true bio-chemical property is accounted for by formulation of the specific learning strategy, algorithmic issues associated with determining the proper molecular descriptors, access to small quantities of data, possibly on tens of molecules only, due to the high cost and complexity of the experimental process, and the complex nature of bio-chemical phenomena underlying the data. This chapter attempts to address this problem from the rudiments: the authors first identify and discuss the salient computational issues that span (and complicate) structure-property modeling formulations and present a brief review of the state-of-the-art. The authors then consider a specific problem: that of modeling intestinal drug absorption, where many of the aforementioned factors play a role. In addressing them, their solution uses a novel characterization of molecular space based on the notion of surface-based molecular similarity. This is followed by identifying a statistically relevant set of molecular descriptors, which along with an appropriate machine learning technique, is used to build the structure-property model. The authors propose simultaneous use of both ratio and ordinal error-measures for model construction and validation. The applicability of the approach is demonstrated in a real world case study.
Collapse
|
32
|
Stegemann B, Klebe G. Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space. Proteins 2011; 80:626-48. [PMID: 22095739 DOI: 10.1002/prot.23226] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 09/29/2011] [Accepted: 10/10/2011] [Indexed: 12/13/2022]
Abstract
Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process.
Collapse
Affiliation(s)
- Björn Stegemann
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany
| | | |
Collapse
|
33
|
Mernberger M, Klebe G, Hüllermeier E. SEGA: semiglobal graph alignment for structure-based protein comparison. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1330-1343. [PMID: 21339532 DOI: 10.1109/tcbb.2011.35] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Comparative analysis is a topic of utmost importance in structural bioinformatics. Recently, a structural counterpart to sequence alignment, called multiple graph alignment, was introduced as a tool for the comparison of protein structures in general and protein binding sites in particular. Using approximate graph matching techniques, this method enables the identification of approximately conserved patterns in functionally related structures. In this paper, we introduce a new method for computing graph alignments motivated by two problems of the original approach, a conceptual and a computational one. First, the existing approach is of limited usefulness for structures that only share common substructures. Second, the goal to find a globally optimal alignment leads to an optimization problem that is computationally intractable. To overcome these disadvantages, we propose a semiglobal approach to graph alignment in analogy to semiglobal sequence alignment that combines the advantages of local and global graph matching.
Collapse
Affiliation(s)
- Marco Mernberger
- Department of Mathematics and Computer Science, Philipps-Universität Marburg, Hans-Meerwein-Straße 6, Marburg D-35032, Germany.
| | | | | |
Collapse
|
34
|
Zhao J, Dundas J, Kachalo S, Ouyang Z, Liang J. Accuracy of functional surfaces on comparatively modeled protein structures. JOURNAL OF STRUCTURAL AND FUNCTIONAL GENOMICS 2011; 12:97-107. [PMID: 21541664 PMCID: PMC3415962 DOI: 10.1007/s10969-011-9109-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/24/2010] [Accepted: 04/20/2011] [Indexed: 12/18/2022]
Abstract
Identification and characterization of protein functional surfaces are important for predicting protein function, understanding enzyme mechanism, and docking small compounds to proteins. As the rapid speed of accumulation of protein sequence information far exceeds that of structures, constructing accurate models of protein functional surfaces and identify their key elements become increasingly important. A promising approach is to build comparative models from sequences using known structural templates such as those obtained from structural genome projects. Here we assess how well this approach works in modeling binding surfaces. By systematically building three-dimensional comparative models of proteins using MODELLER: , we determine how well functional surfaces can be accurately reproduced. We use an alpha shape based pocket algorithm to compute all pockets on the modeled structures, and conduct a large-scale computation of similarity measurements (pocket RMSD and fraction of functional atoms captured) for 26,590 modeled enzyme protein structures. Overall, we find that when the sequence fragment of the binding surfaces has more than 45% identity to that of the template protein, the modeled surfaces have on average an RMSD of 0.5 Å, and contain 48% or more of the binding surface atoms, with nearly all of the important atoms in the signatures of binding pockets captured.
Collapse
Affiliation(s)
- Jieling Zhao
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Sema Kachalo
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Zheng Ouyang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| | - Jie Liang
- Department of Bioengineering, University of Illinois at Chicago, 851 S. Morgan Street, Room 218, SEO, MC-063, Chicago, Illinois, 60607
| |
Collapse
|
35
|
Xie L, Xie L, Bourne PE. Structure-based systems biology for analyzing off-target binding. Curr Opin Struct Biol 2011; 21:189-99. [PMID: 21292475 PMCID: PMC3070778 DOI: 10.1016/j.sbi.2011.01.004] [Citation(s) in RCA: 110] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Revised: 01/11/2011] [Accepted: 01/13/2011] [Indexed: 12/24/2022]
Abstract
Here off-target binding implies the binding of a small molecule of therapeutic interest to a protein target other than the primary target for which it was intended. Increasingly such off-targeting appears to be the norm rather than the exception, rational drug design notwithstanding, and can lead to detrimental side-effects, or opportunities to reposition a therapeutic agent to treat a different condition. Not surprisingly, there is significant interest in determining a priori what off-targets exist on a proteome-wide scale. Beyond determining putative off-targets is the need to understand the impact of such binding on the complete biological system, with the ultimate goal of being able to predict the phenotypic outcome. While a very ambitious goal, some progress is being made.
Collapse
Affiliation(s)
- Lei Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
- Department of Computer Science, Hunter College, the City University of New York, 695 Park Avenue, New York City, NY 10065, USA
| | - Li Xie
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
| | - Philip E. Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego MC9743, 9500 Gilman Drive, La Jolla, CA 92093, USA
| |
Collapse
|
36
|
Tseng YY, Li WH. Evolutionary approach to predicting the binding site residues of a protein from its primary sequence. Proc Natl Acad Sci U S A 2011; 108:5313-8. [PMID: 21402946 PMCID: PMC3069214 DOI: 10.1073/pnas.1102210108] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein binding site residues, especially catalytic residues, play a central role in protein function. Because more than 99% of the ∼ 12 million protein sequences in the nonredundant protein database have no structural information, it is desirable to develop methods to predict the binding site residues of a protein from its primary sequence. This task is highly challenging, because the binding site residues constitute only a small portion of a protein. However, the binding site residues of a protein are clustered in its functional pocket(s), and their spatial patterns tend to be conserved in evolution. To take advantage of these evolutionary and structural principles, we constructed a database of ∼ 50,000 templates (called the pocket-containing segment database), each of which includes not only a sequence segment that contains a functional pocket but also the structural attributes of the pocket. To use this database, we designed a template-matching technique, termed residue-matching profiling, and established a criterion for selecting templates for a query sequence. Finally, we developed a probabilistic model for assigning spatial scores to matched residues between the template and query sequence in local alignments using a set of selected scoring matrices and for computing the binding likelihood of each matched residue in the query sequence. From the likelihoods, one can predict the binding site residues in the query sequence. An automated computational pipeline was developed for our method. A performance evaluation shows that our method achieves a 70% precision in predicting binding site residues at 60% sensitivity.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637; and
| | - Wen-Hsiung Li
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637; and
- Biodiversity Research Center, Academia Sinica, Tapei 115, Taiwan
| |
Collapse
|
37
|
Dundas J, Adamian L, Liang J. Structural signatures of enzyme binding pockets from order-independent surface alignment: a study of metalloendopeptidase and NAD binding proteins. J Mol Biol 2010; 406:713-29. [PMID: 21145898 DOI: 10.1016/j.jmb.2010.12.005] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2010] [Revised: 10/14/2010] [Accepted: 12/03/2010] [Indexed: 10/18/2022]
Abstract
Detecting similarities between local binding surfaces can facilitate identification of enzyme binding sites and prediction of enzyme functions, and aid in our understanding of enzyme mechanisms. Constructing a template of local surface characteristics for a specific enzyme function or binding activity is a challenging task, as the size and shape of the binding surfaces of a biochemical function often vary. Here we introduce the concept of signature binding pockets, which captures information on preserved and varied atomic positions at multiresolution levels. For proteins with complex enzyme binding and activity, multiple signatures arise naturally in our model, forming a signature basis set that characterizes this class of proteins. Both signatures and signature basis sets can be automatically constructed by a method called SOLAR (Signature Of Local Active Regions). This method is based on a sequence-order-independent alignment of computed binding surface pockets. SOLAR also provides a structure-based multiple sequence fragment alignment to facilitate the interpretation of computed signatures. By studying a family of evolutionarily related proteins, we show that for metzincin metalloendopeptidase, which has a broad spectrum of substrate binding, signature and basis set pockets can be used to discriminate metzincins from other enzymes, to predict the subclass of metzincins functions, and to identify specific binding surfaces. Studying unrelated proteins that have evolved to bind to the same NAD cofactor, we constructed signatures of NAD binding pockets and used them to predict NAD binding proteins and to locate NAD binding pockets. By measuring preservation ratio and location variation, our method can identify residues and atoms that are important for binding affinity and specificity. In both cases, we show that signatures and signature basis set reveal significant biological insight.
Collapse
Affiliation(s)
- Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, 835 South Wolcott, Chicago, IL 60612, USA
| | | | | |
Collapse
|
38
|
Schmidtke P, Souaille C, Estienne F, Baurin N, Kroemer RT. Large-scale comparison of four binding site detection algorithms. J Chem Inf Model 2010; 50:2191-200. [PMID: 20828173 DOI: 10.1021/ci1000289] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A large-scale evaluation and comparison of four cavity detection algorithms was carried out. The algorithms SiteFinder, fpocket, PocketFinder, and SiteMap were evaluated on a protein test set containing 5416 protein-ligand complexes and 9900 apo forms, corresponding to a subset of the set used earlier for benchmarking the PocketFinder algorithm. For the holo structures, all four algorithms correctly identified a similar amount of pockets (around 95%). SiteFinder, using optimized parameters, SiteMap, and fpocket showed similar pocket ranking performance, which was defined by ranking the correct binding site on rank 1 of the predictions or within the first 5 ranks of the predictions. On the apo structures, PocketFinder especially and also SiteFinder (optimized parameters) performed best, identifying 96% and 84% of all binding sites, respectively. The fpocket program predicts binding sites most accurately among the algorithms evaluated here. SiteFinder needed an average calculation time of 1.6 s compared with 2 min for SiteMap and around 2 s for fpocket.
Collapse
Affiliation(s)
- Peter Schmidtke
- Sanofi-Aventis VA Research Centre, Structure Design & Informatics, 13 quai Jules Guesde, BP14, 94403 Vitry-sur-Seine, France
| | | | | | | | | |
Collapse
|
39
|
Mehio W, Kemp GJ, Taylor P, Walkinshaw MD. Identification of protein binding surfaces using surface triplet propensities. Bioinformatics 2010; 26:2549-55. [DOI: 10.1093/bioinformatics/btq490] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
40
|
Comin M, Guerra C, Dellaert F. Binding balls: fast detection of binding sites using a property of spherical Fourier transform. J Comput Biol 2010; 16:1577-91. [PMID: 19958084 DOI: 10.1089/cmb.2009.0045] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The functional prediction of proteins is one of the most challenging problems in modern biology. An established computational technique involves the identification of three-dimensional local similarities in proteins. In this article, we present a novel method to quickly identify promising binding sites. Our aim is to efficiently detect putative binding sites without explicitly aligning them. Using the theory of Spherical Harmonics, a candidate binding site is modeled as a Binding Ball. The Binding Ball signature, offered by the Spherical Fourier coefficients, can be efficiently used for a fast detection of putative regions. Our contribution includes the Binding Ball modeling and the definition of a scoring function that does not require aligning candidate regions. Our scoring function can be computed efficiently using a property of Spherical Fourier transform (SFT) that avoids the evaluation of all alignments. Experiments on different ligands show good discrimination power when searching for known binding sites. Moreover, we prove that this method can save up to 40% in time compared with traditional approaches.
Collapse
Affiliation(s)
- Matteo Comin
- Department of Information Engineering, University of Padova, Padova, Italy.
| | | | | |
Collapse
|
41
|
Xiong B, Wu J, Burk DL, Xue M, Jiang H, Shen J. BSSF: a fingerprint based ultrafast binding site similarity search and function analysis server. BMC Bioinformatics 2010; 11:47. [PMID: 20100327 PMCID: PMC3098077 DOI: 10.1186/1471-2105-11-47] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2009] [Accepted: 01/25/2010] [Indexed: 11/17/2022] Open
Abstract
Background Genome sequencing and post-genomics projects such as structural genomics are extending the frontier of the study of sequence-structure-function relationship of genes and their products. Although many sequence/structure-based methods have been devised with the aim of deciphering this delicate relationship, there still remain large gaps in this fundamental problem, which continuously drives researchers to develop novel methods to extract relevant information from sequences and structures and to infer the functions of newly identified genes by genomics technology. Results Here we present an ultrafast method, named BSSF(Binding Site Similarity & Function), which enables researchers to conduct similarity searches in a comprehensive three-dimensional binding site database extracted from PDB structures. This method utilizes a fingerprint representation of the binding site and a validated statistical Z-score function scheme to judge the similarity between the query and database items, even if their similarities are only constrained in a sub-pocket. This fingerprint based similarity measurement was also validated on a known binding site dataset by comparing with geometric hashing, which is a standard 3D similarity method. The comparison clearly demonstrated the utility of this ultrafast method. After conducting the database searching, the hit list is further analyzed to provide basic statistical information about the occurrences of Gene Ontology terms and Enzyme Commission numbers, which may benefit researchers by helping them to design further experiments to study the query proteins. Conclusions This ultrafast web-based system will not only help researchers interested in drug design and structural genomics to identify similar binding sites, but also assist them by providing further analysis of hit list from database searching.
Collapse
Affiliation(s)
- Bing Xiong
- State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Zhangjiang Hi-Tech Park, Pudong, Shanghai, 201203, PR China.
| | | | | | | | | | | |
Collapse
|
42
|
Binkowski TA, Cuff M, Nocek B, Chang C, Joachimiak A. Assisted assignment of ligands corresponding to unknown electron density. ACTA ACUST UNITED AC 2010; 11:21-30. [PMID: 20091237 DOI: 10.1007/s10969-010-9078-7] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2009] [Accepted: 01/03/2010] [Indexed: 11/28/2022]
Abstract
A semi-automated computational procedure to assist in the identification of bound ligands from unknown electron density has been developed. The atomic surface surrounding the density blob is compared to a library of three-dimensional ligand binding surfaces extracted from the Protein Data Bank (PDB). Ligands corresponding to surfaces which share physicochemical texture and geometric shape similarities are considered for assignment. The method is benchmarked against a set of well represented ligands from the PDB, in which we show that we can identify the correct ligand based on the corresponding binding surface. Finally, we apply the method during model building and refinement stages from structural genomics targets in which unknown density blobs were discovered. A semi-automated computational method is described which aims to assist crystallographers with assigning the identity of a ligand corresponding to unknown electron density. Using shape and physicochemical similarity assessments between the protein surface surrounding the density and a database of known ligand binding surfaces, a plausible list of candidate ligands are identified for consideration. The method is validated against highly observed ligands from the Protein Data Bank and results are shown from its use in a high-throughput structural genomics pipeline.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Midwest Center for Structural Genomics (MCSG), Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439, USA.
| | | | | | | | | |
Collapse
|
43
|
Abstract
The function of a protein is often fulfilled via molecular interactions on its surfaces, so identifying the functional surface(s) of a protein is helpful for understanding its function. Here, we introduce the concept of a split pocket, which is a pocket that is split by a cognate ligand. We use a geometric approach that is site-specific. Specifically, we first compute a set of all pockets in the protein with its ligand(s) and a set of all pockets with the ligand(s) removed and then compare the two sets of pockets to identify the split pocket(s) of the protein. To reduce the search space and expedite the process of surface partitioning, we design probe radii according to the physicochemical textures of molecules. Our method achieves a success rate of 96% on a benchmark test set. We conduct a large-scale computation to identify approximately 19,000 split pockets from 11,328 structures (1.16 million potential pockets); for each pocket, we obtain residue composition, solvent-accessible area, and molecular volume. With this database of split pockets, our method can be used to predict the functional surfaces of unbound structures. Indeed, the functional surface of an unbound protein may often be found from its similarity to remotely related bound forms that belong to distinct folds. Finally, we apply our method to identify glucose-binding proteins, including unbound structures. Our study demonstrates the power of geometric and evolutionary matching for studying protein functional evolution and provides a framework for classifying protein functions by local spatial patterns of functional surfaces.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Ecology and Evolution, University of Chicago, Chicago, Illinois 60637, USA
| | | |
Collapse
|
44
|
Tseng YY, Chen ZJ, Li WH. fPOP: footprinting functional pockets of proteins by comparative spatial patterns. Nucleic Acids Res 2009; 38:D288-95. [PMID: 19880384 PMCID: PMC2808891 DOI: 10.1093/nar/gkp900] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
fPOP (footprinting Pockets Of Proteins, http://pocket.uchicago.edu/fpop/) is a relational database of the protein functional surfaces identified by analyzing the shapes of binding sites in ∼42 700 structures, including both holo and apo forms. We previously used a purely geometric method to extract the spatial patterns of functional surfaces (split pockets) in ∼19 000 bound structures and constructed a database, SplitPocket (http://pocket.uchicago.edu/). These functional surfaces are now used as spatial templates to predict the binding surfaces of unbound structures. To conduct a shape comparison, we use the Smith–Waterman algorithm to footprint an unbound pocket fragment with those of the functional surfaces in SplitPocket. The pairwise alignment of the unbound and bound pocket fragments is used to evaluate the local structural similarity via geometric matching. The final results of our large-scale computation, including ∼90 000 identified or predicted functional surfaces, are stored in fPOP. This database provides an easily accessible resource for studying functional surfaces, assessing conformational changes between bound and unbound forms and analyzing functional divergence. Moreover, it may facilitate the exploration of the physicochemical textures of molecules and the inference of protein function. Finally, our approach provides a framework for classification of proteins into families on the basis of their functional surfaces.
Collapse
Affiliation(s)
- Yan Yuan Tseng
- Department of Ecology and Evolution, University of Chicago, Chicago, IL 60637, USA.
| | | | | |
Collapse
|
45
|
Wang Y, Wu LY, Zhang JH, Zhan ZW, Zhang XS, Chen L. Evaluating protein similarity from coarse structures. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2009; 6:583-593. [PMID: 19875857 DOI: 10.1109/tcbb.2007.70250] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
To unscramble the relationship between protein function and protein structure, it is essential to assess the protein similarity from different aspects. Although many methods have been proposed for protein structure alignment or comparison, alternative similarity measures are still strongly demanded due to the requirement of fast screening and query in large-scale structure databases. In this paper, we first formulate a novel representation of a protein structure, i.e., Feature Sequence of Surface (FSS). Then, a new score scheme is developed to measure the similarity between two representations. To verify the proposed method, numerical experiments are conducted in four different protein data sets. We also classify SARS coronavirus to verify the effectiveness of the new method. Furthermore, preliminary results of fast classification of the whole CATH v2.5.1 database based on the new macrostructure similarity are given as a pilot study. We demonstrate that the proposed approach to measure the similarities between protein structures is simple to implement, computationally efficient, and surprisingly fast. In addition, the method itself provides a new and quantitative tool to view a protein structure.
Collapse
Affiliation(s)
- Yong Wang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, No. 55 Zhongguancun East Road, Beijing, 100080, China.
| | | | | | | | | | | |
Collapse
|
46
|
Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci U S A 2009; 106:17377-82. [PMID: 19805138 DOI: 10.1073/pnas.0907971106] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It has become increasingly apparent that geometric relationships often exist between regions of two proteins that have quite different global topologies or folds. In this article, we examine whether such relationships can be used to infer a functional connection between the two proteins in question. We find, by considering a number of examples involving metal and cation binding, sugar binding, and aromatic group binding, that geometrically similar protein fragments can share related functions, even if they have been classified as belonging to different folds and topologies. Thus, the use of classifications inevitably limits the number of functional inferences that can be obtained from the comparative analysis of protein structures. In contrast, the development of interactive computational tools that recognize the "continuous" nature of protein structure/function space, by increasing the number of potentially meaningful relationships that are considered, may offer a dramatic enhancement in the ability to extract information from protein structure databases. We introduce the MarkUs server, that embodies this strategy and that is designed for a user interested in developing and validating specific functional hypotheses.
Collapse
|
47
|
Fast screening of protein surfaces using geometric invariant fingerprints. Proc Natl Acad Sci U S A 2009; 106:16622-6. [PMID: 19805347 DOI: 10.1073/pnas.0906146106] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We develop a rapid and efficient method for the comparison of protein local surface similarities using geometric invariants (fingerprints). By combining fast fingerprint comparison with explicit alignment, we successfully screen the entire Protein Data Bank for proteins that possess local surface similarities. Our method is independent of sequence and fold similarities, and has potential application to protein structure annotation and protein-protein interface design.
Collapse
|
48
|
Xie L, Xie L, Bourne PE. A unified statistical model to support local sequence order independent similarity searching for ligand-binding sites and its application to genome-based drug discovery. Bioinformatics 2009; 25:i305-12. [PMID: 19478004 PMCID: PMC2687974 DOI: 10.1093/bioinformatics/btp220] [Citation(s) in RCA: 70] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Functional relationships between proteins that do not share global structure similarity can be established by detecting their ligand-binding-site similarity. For a large-scale comparison, it is critical to accurately and efficiently assess the statistical significance of this similarity. Here, we report an efficient statistical model that supports local sequence order independent ligand-binding-site similarity searching. Most existing statistical models only take into account the matching vertices between two sites that are defined by a fixed number of points. In reality, the boundary of the binding site is not known or is dependent on the bound ligand making these approaches limited. To address these shortcomings and to perform binding-site mapping on a genome-wide scale, we developed a sequence-order independent profile-profile alignment (SOIPPA) algorithm that is able to detect local similarity between unknown binding sites a priori. The SOIPPA scoring integrates geometric, evolutionary and physical information into a unified framework. However, this imposes a significant challenge in assessing the statistical significance of the similarity because the conventional probability model that is based on fixed-point matching cannot be applied. Here we find that scores for binding-site matching by SOIPPA follow an extreme value distribution (EVD). Benchmark studies show that the EVD model performs at least two-orders faster and is more accurate than the non-parametric statistical method in the previous SOIPPA version. Efficient statistical analysis makes it possible to apply SOIPPA to genome-based drug discovery. Consequently, we have applied the approach to the structural genome of Mycobacterium tuberculosis to construct a protein-ligand interaction network. The network reveals highly connected proteins, which represent suitable targets for promiscuous drugs.
Collapse
Affiliation(s)
- Lei Xie
- San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA.
| | | | | |
Collapse
|
49
|
Wallach I, Lilien RH. Prediction of sub-cavity binding preferences using an adaptive physicochemical structure representation. Bioinformatics 2009; 25:i296-304. [PMID: 19478002 PMCID: PMC2687958 DOI: 10.1093/bioinformatics/btp204] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
MOTIVATION The ability to predict binding profiles for an arbitrary protein can significantly improve the areas of drug discovery, lead optimization and protein function prediction. At present, there are no successful algorithms capable of predicting binding profiles for novel proteins. Existing methods typically rely on manually curated templates or entire active site comparison. Consequently, they perform best when analyzing proteins sharing significant structural similarity with known proteins (i.e. proteins resulting from divergent evolution). These methods fall short when used to characterize the binding profile of a novel active site or one for which a template is not available. In contrast to previous approaches, our method characterizes the binding preferences of sub-cavities within the active site by exploiting a large set of known protein-ligand complexes. The uniqueness of our approach lies not only in the consideration of sub-cavities, but also in the more complete structural representation of these sub-cavities, their parametrization and the method by which they are compared. By only requiring local structural similarity, we are able to leverage previously unused structural information and perform binding inference for proteins that do not share significant structural similarity with known systems. RESULTS Our algorithm demonstrates the ability to accurately cluster similar sub-cavities and to predict binding patterns across a diverse set of protein-ligand complexes. When applied to two high-profile drug targets, our algorithm successfully generates a binding profile that is consistent with known inhibitors. The results suggest that our algorithm should be useful in structure-based drug discovery and lead optimization.
Collapse
Affiliation(s)
- Izhar Wallach
- Department of Computer Science, Donnelly Centre for Cellular and Biomolecular Research and Banting and Best, University of Toronto, Toronto, Ontario, Canada.
| | | |
Collapse
|
50
|
Drug discovery using chemical systems biology: identification of the protein-ligand binding network to explain the side effects of CETP inhibitors. PLoS Comput Biol 2009; 5:e1000387. [PMID: 19436720 PMCID: PMC2676506 DOI: 10.1371/journal.pcbi.1000387] [Citation(s) in RCA: 185] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2009] [Accepted: 04/13/2009] [Indexed: 01/11/2023] Open
Abstract
Systematic identification of protein-drug interaction networks is crucial to correlate complex modes of drug action to clinical indications. We introduce a novel computational strategy to identify protein-ligand binding profiles on a genome-wide scale and apply it to elucidating the molecular mechanisms associated with the adverse drug effects of Cholesteryl Ester Transfer Protein (CETP) inhibitors. CETP inhibitors are a new class of preventive therapies for the treatment of cardiovascular disease. However, clinical studies indicated that one CETP inhibitor, Torcetrapib, has deadly off-target effects as a result of hypertension, and hence it has been withdrawn from phase III clinical trials. We have identified a panel of off-targets for Torcetrapib and other CETP inhibitors from the human structural genome and map those targets to biological pathways via the literature. The predicted protein-ligand network is consistent with experimental results from multiple sources and reveals that the side-effect of CETP inhibitors is modulated through the combinatorial control of multiple interconnected pathways. Given that combinatorial control is a common phenomenon observed in many biological processes, our findings suggest that adverse drug effects might be minimized by fine-tuning multiple off-target interactions using single or multiple therapies. This work extends the scope of chemogenomics approaches and exemplifies the role that systems biology has in the future of drug discovery. Both the cost to launch a new drug and the attrition rate during the late stage of the drug discovery and development process are increasing. Torcetrapib is a case in point, having been withdrawn from phase III clinical trials after 15 years of development and an estimated cost of US $800 M. Torcetrapib represents a new class of therapies for the treatment of cardiovascular disease; however, clinical studies indicated that Torcetrapib has deadly side-effects as a result of hypertension. To understand the origins of these adverse drug reactions from Torcetrapib and other related drugs undergoing clinical trials, we introduce a systematic strategy to identify off-targets in the human structural proteome and investigate the roles of these off-targets in impacting human physiology and pathology using biochemical pathway analysis. Our findings suggest that potential side-effects of a new drug can be identified at an early stage of the development cycle and be minimized by fine-tuning multiple off-target interactions. The hope is that this can reduce both the cost of drug development and the mortality rates during clinical trials.
Collapse
|