1
|
Zaki MEA, Al-Hussain SA, Al-Mutairi AA, Samad A, Ghosh A, Chaudhari S, Khatale PN, Ajmire P, Jawarkar RD. In-silico studies to recognize repurposing therapeutics toward arginase-I inhibitors as a potential onco-immunomodulators. Front Pharmacol 2023; 14:1129997. [PMID: 37144217 PMCID: PMC10151555 DOI: 10.3389/fphar.2023.1129997] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2022] [Accepted: 02/27/2023] [Indexed: 05/06/2023] Open
Abstract
Rudolf Virchow was the first person to point out the important link between immune function and cancer. He did this by noticing that leukocytes were often found in tumors. Overexpression of arginase 1 (ARG1) and inducible nitric oxide synthase (iNOS) in myeloid-derived suppressor cells (MDSCs) and tumour-associated macrophages (TAMs) depletes both intracellular and extracellular arginine. TCR signalling is slowed as a result, and the same types of cells produce reactive oxygen and nitrogen species (ROS and RNS), which aggravates the situation. Human arginase I is a double-stranded manganese metalloenzyme that helps L-arginine break down into L-ornithine and urea. Thus, a quantitative structure-activity relationship (QSAR) analysis was performed to unearth the unrecognised structural aspects crucial for arginase-I inhibition. In this work, a balanced QSAR model with good prediction performance and clear mechanistic interpretation was developed using a dataset of 149 molecules encompassing a broad range of structural scaffolds and compositions. The model was made to meet OECD standards, and all of its validation parameters have values that are higher than the minimum requirements (R2 tr = 0.89, Q2 LMO = 0.86, and R2 ex = 0.85). The present QSAR study linked structural factors to arginase-I inhibitory action, including the proximity of lipophilic atoms to the molecule's centre of mass (within 3A), the position of the donor to the ring nitrogen (exactly 3 bonds away), and the surface area ratio. As OAT-1746 and two others are the only arginase-I inhibitors in development at the time, we have performed a QSAR-based virtual screening with 1650 FDA compounds taken from the zinc database. In this screening, 112 potential hit compounds were found to have a PIC50 value of less than 10 nm against the arginase-I receptor. The created QSAR model's application domain was evaluated in relation to the most active hit molecules identified using QSAR-based virtual screening, utilising a training set of 149 compounds and a prediction set of 112 hit molecules. As shown in the Williams plot, the top hit molecule, ZINC000252286875, has a low leverage value of HAT i/i h* = 0.140, placing it towards the boundary of the usable range. Furthermore, one of 112 hit molecules with a docking score of -10.891 kcal/mol (PIC50 = 10.023 M) was isolated from a study of arginase-I using molecular docking. Protonated ZINC000252286875-linked arginase-1 showed 2.9 RMSD, whereas non-protonated had 1.8. RMSD plots illustrate protein stability in protonated and non-protonated ZINC000252286875-bound states. Protonated-ZINC000252286875-bound proteins contain 25 Rg. The non-protonated protein-ligand combination exhibits a 25.2-Rg, indicating compactness. Protonated and non-protonated ZINC000252286875 stabilised protein targets in binding cavities posthumously. Significant root mean square fluctuations (RMSF) were seen in the arginase-1 protein at a small number of residues for a time function of 500 ns in both the protonated and unprotonated states. Protonated and non-protonated ligands interacted with proteins throughout the simulation. ZINC000252286875 bound Lys64, Asp124, Ala171, Arg222, Asp232, and Gly250. Aspartic acid residue 232 exhibited 200% ionic contact. 500-ns simulations-maintained ions. Salt bridges for ZINC000252286875 aided docking. ZINC000252286875 created six ionic bonds with Lys68, Asp117, His126, Ala171, Lys224, and Asp232 residues. Asp117, His126, and Lys224 showed 200% ionic interactions. In protonated and deprotonated states, GbindvdW, GbindLipo, and GbindCoulomb energies played crucial role. Moreover, ZINC000252286875 meets all of the ADMET standards to serve as a drug. As a result, the current analyses were successful in locating a novel and potent hit molecule that inhibits arginase-I effectively at nanomolar concentrations. The results of this investigation can be used to develop brand-new arginase I inhibitors as an alternative immune-modulating cancer therapy.
Collapse
Affiliation(s)
- Magdi E. A. Zaki
- Department of Chemistry, Faculty of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
- *Correspondence: Magdi E. A. Zaki, ; Rahul D. Jawarkar,
| | - Sami A. Al-Hussain
- Department of Chemistry, Faculty of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
| | - Aamal A. Al-Mutairi
- Department of Chemistry, Faculty of Science, Imam Mohammad Ibn Saud Islamic University, Riyadh, Saudi Arabia
| | - Abdul Samad
- Department of Pharmaceutical Chemistry, Faculty of Pharmacy, Tishk International University, Erbil, Kurdistan Region, Iraq
| | - Arabinda Ghosh
- Microbiology Division, Department of Botany, Gauhati University, Guwahati, India
| | - Somdatta Chaudhari
- Department of Pharmaceutical Chemistry, Progressive Education Society’s Modern College of Pharmacy, Pune, India
| | - Pravin N. Khatale
- Department of Medicinal Chemistry, Dr Rajendra Gode Institute of Pharmacy, Amravati, Maharashtra, India
| | - Prashant Ajmire
- Department of Medicinal Chemistry, Dr Rajendra Gode Institute of Pharmacy, Amravati, Maharashtra, India
| | - Rahul D. Jawarkar
- Department of Medicinal Chemistry, Dr Rajendra Gode Institute of Pharmacy, Amravati, Maharashtra, India
- *Correspondence: Magdi E. A. Zaki, ; Rahul D. Jawarkar,
| |
Collapse
|
2
|
Taylor WR, Stoye JP, Taylor IA. A comparative analysis of the foamy and ortho virus capsid structures reveals an ancient domain duplication. BMC STRUCTURAL BIOLOGY 2017; 17:3. [PMID: 28372592 PMCID: PMC5379526 DOI: 10.1186/s12900-017-0073-0] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2016] [Accepted: 03/10/2017] [Indexed: 01/28/2023]
Abstract
BACKGROUND The Spumaretrovirinae (foamy viruses) and the Orthoretrovirinae (e.g. HIV) share many similarities both in genome structure and the sequences of the core viral encoded proteins, such as the aspartyl protease and reverse transcriptase. Similarity in the gag region of the genome is less obvious at the sequence level but has been illuminated by the recent solution of the foamy virus capsid (CA) structure. This revealed a clear structural similarity to the orthoretrovirus capsids but with marked differences that left uncertainty in the relationship between the two domains that comprise the structure. METHODS We have applied protein structure comparison methods in order to try and resolve this ambiguous relationship. These included both the DALI method and the SAP method, with rigorous statistical tests applied to the results of both methods. For this, we employed collections of artificial fold 'decoys' (generated from the pair of native structures being compared) to provide a customised background distribution for each comparison, thus allowing significance levels to be estimated. RESULTS We have shown that the relationship of the two domains conforms to a simple linear correspondence rather than a domain transposition. These similarities suggest that the origin of both viral capsids was a common ancestor with a double domain structure. In addition, we show that there is also a significant structural similarity between the amino and carboxy domains in both the foamy and ortho viruses. CONCLUSIONS These results indicate that, as well as the duplication of the double domain capsid, there may have been an even more ancient gene-duplication that preceded the double domain structure. In addition, our structure comparison methodology demonstrates a general approach to problems where the components have a high intrinsic level of similarity.
Collapse
Affiliation(s)
- William R. Taylor
- Computational Cell and Molecular Biology Laboratory, Francis Crick Institute, Midland Road, London, NW1 1AT UK
| | - Jonathan P. Stoye
- Retrovirus-Host Interactions Laboratory, Francis Crick Institute, Midland Road, London, NW1 1AT UK
| | - Ian A. Taylor
- Macromolecular Structure Laboratory, Francis Crick Institute, Midland Road, London, NW1 1AT UK
| |
Collapse
|
3
|
Taylor WR, Matthews-Palmer TRS, Beeby M. Molecular Models for the Core Components of the Flagellar Type-III Secretion Complex. PLoS One 2016; 11:e0164047. [PMID: 27855178 PMCID: PMC5113899 DOI: 10.1371/journal.pone.0164047] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2016] [Accepted: 09/19/2016] [Indexed: 01/10/2023] Open
Abstract
We show that by using a combination of computational methods, consistent three-dimensional molecular models can be proposed for the core proteins of the type-III secretion system. We employed a variety of approaches to reconcile disparate, and sometimes inconsistent, data sources into a coherent picture that for most of the proteins indicated a unique solution to the constraints. The range of difficulty spanned from the trivial (FliQ) to the difficult (FlhA and FliP). The uncertainties encountered with FlhA were largely the result of the greater number of helix packing possibilities allowed in a large protein, however, for FliP, there remains an uncertainty in how to reconcile the large displacement predicted between its two main helical hairpins and their ability to sit together happily across the bacterial membrane. As there is still no high resolution structural information on any of these proteins, we hope our predicted models may be of some use in aiding the interpretation of electron microscope images and in rationalising mutation data and experiments.
Collapse
Affiliation(s)
- William R. Taylor
- Laboratory of Computational Cell and Molecular Biology, Francis Crick Institute, 1 Midland Rd., London NW1 1AT, United Kingdom
| | - Teige R. S. Matthews-Palmer
- Laboratory of Computational Cell and Molecular Biology, Francis Crick Institute, 1 Midland Rd., London NW1 1AT, United Kingdom
- Department of Life Sciences, Imperial College, London, United Kingdom
| | - Morgan Beeby
- Department of Life Sciences, Imperial College, London, United Kingdom
| |
Collapse
|
4
|
Abstract
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Collapse
Affiliation(s)
- Abel Rodriguez
- University of California, Santa Cruz and Duke University
| | | |
Collapse
|
5
|
Mardia KV. Statistical approaches to three key challenges in protein structural bioinformatics. J R Stat Soc Ser C Appl Stat 2013. [DOI: 10.1111/rssc.12003] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
6
|
Magis C, Stricher F, van der Sloot AM, Serrano L, Notredame C. T-RMSD: A Fine-grained, Structure-based Classification Method and its Application to the Functional Characterization of TNF Receptors. J Mol Biol 2010; 400:605-17. [DOI: 10.1016/j.jmb.2010.05.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Revised: 04/21/2010] [Accepted: 05/07/2010] [Indexed: 01/06/2023]
|
7
|
Pisanti N, Soldano H, Carpentier M, Pothier J. A Relational Extension of the Notion of Motifs: Application to the Common 3D Protein Substructures Searching Problem. J Comput Biol 2009; 16:1635-60. [PMID: 20047489 DOI: 10.1089/cmb.2008.0019] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Nadia Pisanti
- Dipartimento di Informatica, Università di Pisa, Largo B. Pontecorvo, Pisa, Italy
- LIPN–UMR 7030 CNRS, Université Paris 13, Villetaneuse, France
| | - Henry Soldano
- LIPN–UMR 7030 CNRS, Université Paris 13, Villetaneuse, France
- Université Pierre et Marie Curie-Paris6, Atelier de BioInformatique, Paris, France
| | - Mathilde Carpentier
- Université Pierre et Marie Curie-Paris6, Equipe de Génomique Analytique, INSERM511 Paris, France
| | - Joel Pothier
- Université Pierre et Marie Curie-Paris6, Atelier de BioInformatique, Paris, France
| |
Collapse
|
8
|
Gherardini PF, Helmer-Citterich M. Structure-based function prediction: approaches and applications. BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS 2008; 7:291-302. [PMID: 18599513 DOI: 10.1093/bfgp/eln030] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
The ever increasing number of protein structures determined by structural genomic projects has spurred much interest in the development of methods for structure-based function prediction. Existing methods can be roughly classified in two groups: some use a comparative approach looking for the presence of structural motifs possibly associated with a known biochemical function. Other methods try to identify functional patches on the surface of a protein using only its physicochemical characteristics. This review will cover both kinds of approaches to structure-based function prediction as well as their use in real-world cases. The main issues and limitations in using protein structure to predict function will also be discussed. These are mainly: the assessment of the statistical significance of structural similarities and the extent to which these methods depend on the accuracy and availability of structural data.
Collapse
Affiliation(s)
- Pier Federico Gherardini
- Department of Biology, Centre for Molecular Bioinformatics, University of Tor Vergata, Rome, Italy.
| | | |
Collapse
|
9
|
Taylor WR. Decoy models for protein structure comparison score normalisation. J Mol Biol 2006; 357:676-99. [PMID: 16457842 DOI: 10.1016/j.jmb.2005.12.084] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2005] [Revised: 12/16/2005] [Accepted: 12/29/2005] [Indexed: 11/26/2022]
Abstract
A method is described to construct sets of decoy models that can be used to generate a background score distribution for protein structure comparison. The models are derived directly from the two proteins being compared and retain all the essential properties of the structures, including length, density, shape and secondary structure composition but have different folds. As each comparison involves a pair of proteins of the same length, no explicit normalisation is required to adjust for the length of the proteins being compared. This allows substructure (or domain) matches to score almost equally to the comparison of isolated domains. A normalised probability measure was derived that allows joint family/family comparison. The method was applied to some of the CASP6 models for targets with new folds.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
| |
Collapse
|
10
|
Park SH, Ryu KH, Gilbert D. Fast similarity search for protein 3D structures using topological pattern matching based on spatial relations. Int J Neural Syst 2005; 15:287-96. [PMID: 16187404 DOI: 10.1142/s0129065705000244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Similarity search for protein 3D structures become complex and computationally expensive due to the fact that the size of protein structure databases continues to grow tremendously. Recently, fast structural similarity search systems have been required to put them into practical use in protein structure classification whilst existing comparison systems do not provide comparison results on time. Our approach uses multi-step processing that composes of a preprocessing step to represent geometry of protein structures with spatial objects, a filter step to generate a small candidate set using approximate topological string matching, and a refinement step to compute a structural alignment. This paper describes the preprocessing and filtering for fast similarity search using the discovery of topological patterns of secondary structure elements based on spatial relations. Our system is fully implemented by using Oracle 8i spatial. We have previously shown that our approach has the advantage of speed of performance compared with other approach such as DALI. This work shows that the discovery of topological relations of secondary structure elements in protein structures by using spatial relations of spatial databases is practical for fast structural similarity search for proteins.
Collapse
Affiliation(s)
- Sung-Hee Park
- Database Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Cheongju, 361-763, Korea.
| | | | | |
Collapse
|
11
|
Comin M, Guerra C, Zanotti G. PROuST: a comparison method of three-dimensional structures of proteins using indexing techniques. J Comput Biol 2005; 11:1061-72. [PMID: 15662198 DOI: 10.1089/cmb.2004.11.1061] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We present a new method for protein structure comparison that combines indexing and dynamic programming (DP). The method is based on simple geometric features of triplets of secondary structures of proteins. These features provide indexes to a hash table that allows fast retrieval of similarity information for a query protein. After the query protein is matched with all proteins in the hash table producing a list of putative similarities, the dynamic programming algorithm is used to align the query protein with each protein of this list. Since the pairwise comparison with DP is applied only to a small subset of proteins and, furthermore, DP re-uses information that is already computed and stored in the hash table, the approach is very fast even when searching the entire PDB. We have done extensive experimentation showing that our approach achieves results of quality comparable to that of other existing approaches but is generally faster.
Collapse
Affiliation(s)
- Matteo Comin
- Department of Information Engineering, University of Padova, 35131 Padova, Italy
| | | | | |
Collapse
|
12
|
Abstract
When a new protein structure has been determined, comparison with the database of known structures enables classification of its fold as new or belonging to a known class of proteins. This in turn may provide clues about the function of the protein. A large number of fold comparison programs have been developed, but they have never been subjected to a comprehensive and critical comparative analysis. Here we describe an evaluation of 11 publicly available, Web-based servers for automatic fold comparison. Both their functionality (e.g., user interface, presentation, and annotation of results) and their performance (i.e., how well established structural similarities are recognized) were assessed. The servers were subjected to a battery of performance tests covering a broad spectrum of folds as well as special cases, such as multidomain proteins, Calpha-only models, new folds, and NMR-based models. The CATH structural classification system was used as a reference. These tests revealed the strong and weak sides of each server. On the whole, CE, DALI, MATRAS, and VAST showed the best performance, but none of the servers achieved a 100% success rate. Where no structurally similar proteins are found by any individual server, it is recommended to try one or two other servers before any conclusions concerning the novelty of a fold are put on paper.
Collapse
Affiliation(s)
- Marian Novotny
- Department of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Uppsala, Sweden
| | | | | |
Collapse
|
13
|
Dror O, Benyamini H, Nussinov R, Wolfson HJ. Multiple structural alignment by secondary structures: algorithm and applications. Protein Sci 2003; 12:2492-507. [PMID: 14573862 PMCID: PMC2366961 DOI: 10.1110/ps.03200603] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2003] [Revised: 07/22/2003] [Accepted: 08/02/2003] [Indexed: 10/26/2022]
Abstract
We present MASS (Multiple Alignment by Secondary Structures), a novel highly efficient method for structural alignment of multiple protein molecules and detection of common structural motifs. MASS is based on a two-level alignment, using both secondary structure and atomic representation. Utilizing secondary structure information aids in filtering out noisy solutions and achieves efficiency and robustness. Currently, only a few methods are available for addressing the multiple structural alignment task. In addition to using secondary structure information, the advantage of MASS as compared to these methods is that it is a combination of several important characteristics: (1) While most existing methods are based on series of pairwise comparisons, and thus might miss optimal global solutions, MASS is truly multiple, considering all the molecules simultaneously; (2) MASS is sequence order-independent and thus capable of detecting nontopological structural motifs; (3) MASS is able to detect not only structural motifs, shared by all input molecules, but also motifs shared only by subsets of the molecules. Here, we show the application of MASS to various protein ensembles. We demonstrate its ability to handle a large number (order of tens) of molecules, to detect nontopological motifs and to find biologically meaningful alignments within nonpredefined subsets of the input. In particular, we show how by using conserved structural motifs, one can guide protein-protein docking, which is a notoriously difficult problem. MASS is freely available at http://bioinfo3d.cs.tau.ac.il/MASS/.
Collapse
Affiliation(s)
- Oranit Dror
- School of Computer Science, Raymond and Beverly Sackler Faculty of Exact Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
| | | | | | | |
Collapse
|
14
|
Allen BCP, Grant GH, Richards WG. Calculation of protein domain structural similarity using two-dimensional representations. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES 2003; 43:134-43. [PMID: 12546546 DOI: 10.1021/ci020275t] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
By reducing protein structures to two-dimensional representations, it is possible to speed up the alignment of the structures and hence calculate similarity indices faster that using three-dimensional representations. Using amino acid based representations gives much better discrimination between proteins and faster calculations. Taking into account the relative similarity of the amino acids involved allowed improved accuracy at very little time cost.
Collapse
Affiliation(s)
- Benjamin C P Allen
- Central Chemistry Laboratory, University of Oxford, South Parks Road, Oxford OX1 3QH, United Kingdom
| | | | | |
Collapse
|
15
|
Taylor WR. Comparing secondary structure 'stick' models of proteins using graph matching with double dynamic programming. ERNST SCHERING RESEARCH FOUNDATION WORKSHOP 2002:133-48. [PMID: 12060999 DOI: 10.1007/978-3-662-04747-7_7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- W R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, UK.
| |
Collapse
|
16
|
Taylor WR. Protein structure comparison using bipartite graph matching and its application to protein structure classification. Mol Cell Proteomics 2002; 1:334-9. [PMID: 12096115 DOI: 10.1074/mcp.t200001-mcp200] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
A measure of protein structure similarity is calculated from the matching of pairs of secondary structure elements between two proteins. The interaction of each pair was estimated from their axial line segments and combined with other geometric features to produce an optimal discrimination between intrafamily and interfamily relationships. The matching used a fast bipartite graph-matching algorithm that avoids the computational complexity of searching for the full subgraph isomorphism between the two sets of interactions. The main algorithm used was the "stable marriage" algorithm, which works on the ranked "preferences" of one interaction for another. The method takes 1/10 of a second for a typical comparison making it suitable as a fast pre-filter for slower, more exhaustive approaches. An application to protein structure classification is described.
Collapse
Affiliation(s)
- William R Taylor
- Division of Mathematical Biology, National Institute for Medical Research, The Ridgeway, Mill Hill, London NW7 1AA, United Kingdom.
| |
Collapse
|
17
|
Maggiora GM, Rohrer DC, Mestres J. Comparing protein structures: a Gaussian-based approach to the three-dimensional structural similarity of proteins. J Mol Graph Model 2002; 19:168-78. [PMID: 11381528 DOI: 10.1016/s1093-3263(00)00129-7] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
Abstract
This study describes a new method for comparing three-dimensional protein structures based on an optimal alignment of their steric fields. The method is based upon the use of spherical Gaussian functions located on individual atoms. This representation generates a flexible description of the underlying fold geometry of proteins that can be adjusted by changing the 'width' of the Gaussians. Reducing the width sharpens the representation and leads to a more 'atomlike' description; increasing the width creates a fuzzier representation that preserves the general shape features of the chain fold but with a consequent loss in atomic resolution. The width used in this study is based upon the features of individual atoms and provides a representation that is quite robust with respect to the variety of geometric features typically encountered in the alignment process. In addition, a post-alignment analysis is performed that generates sequence alignments from the corresponding structure alignments. An example, based on five mammalian and fungal matrix metalloproteinase crystal structures (human fibroblast collagenase, neutrophil collagenase, stromelysin, astacin, and adamalysin), illustrates a number of features of the Gaussian-based approach.
Collapse
Affiliation(s)
- G M Maggiora
- Computer-Aided Drug Discovery, Pharmacia Corporation, Kalamazoo, MI 49007-4940, USA.
| | | | | |
Collapse
|
18
|
Abstract
We introduce a new variant of the root mean square distance (RMSD) for comparing protein structures whose range of values is independent of protein size. This new dimensionless measure (relative RMSD, or RRMSD) is zero between identical structures and one between structures that are as globally dissimilar as an average pair of random polypeptides of respective sizes. The RRMSD probability distribution between random polypeptides converges to a universal curve as the chain length increases. The correlation coefficients between aligned random structures are computed as a function of polypeptide size showing two characteristic lengths of 4.7 and 37 residues. These lengths mark the separation between phases of different structural order between native protein fragments. The implications for threading are discussed.
Collapse
Affiliation(s)
- M R Betancourt
- Laboratory of Computational Genomics, The Donald Danforth Plant Science Center, 893 N. Warson Rd., Creve Coeur, MO 63141, USA
| | | |
Collapse
|
19
|
Abstract
This article investigates aspects of pairwise and multiple structure comparison, and the problem of automatically discover common patterns in a set of structures. Descriptions and representation of structures and patterns are described, as well as scoring and algorithms for comparison and discovery. A framework and nomenclature is developed for classifying different methods, and many of these are reviewed and placed into this framework.
Collapse
Affiliation(s)
- I Eidhammer
- Department of Informatics, University of Bergen, Høyteknologisentret, N-5020 Bergen, Norway.
| | | | | |
Collapse
|
20
|
Orengo CA, Sillitoe I, Reeves G, Pearl FM. Review: what can structural classifications reveal about protein evolution? J Struct Biol 2001; 134:145-65. [PMID: 11551176 DOI: 10.1006/jsbi.2001.4398] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
In this article we present a review of the methods used for comparing and classifying protein structures. We discuss the hierarchies and populations of fold groups and evolutionary families in some of the major classifications and we consider some of the problems confronting any general analyses of structural evolution in protein families. We also review some more recent analyses that have expanded these classifications by identifying sequence relatives in the genomes and thereby reveal interesting trends in fold usage and recurrence.
Collapse
Affiliation(s)
- C A Orengo
- Department of Biochemistry and Molecular Biology, University College, Gower Street, London, WC1E 6BT, United Kingdom
| | | | | | | |
Collapse
|
21
|
Bray JE, Todd AE, Pearl FM, Thornton JM, Orengo CA. The CATH Dictionary of Homologous Superfamilies (DHS): a consensus approach for identifying distant structural homologues. PROTEIN ENGINEERING 2000; 13:153-65. [PMID: 10775657 DOI: 10.1093/protein/13.3.153] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
A consensus approach has been developed for identifying distant structural homologues. This is based on the CATH Dictionary of Homologous Superfamilies (DHS), a database of validated multiple structural alignments annotated with consensus functional information for evolutionary protein superfamilies (URL: http://www. biochem.ucl.ac.uk/bsm/dhs). Multiple structural alignments have been generated for 362 well-populated superfamilies in the CATH structural domain database and annotated with secondary structure, physicochemical properties, functional sequence patterns and protein-ligand interaction data. Consensus functional information for each superfamily includes descriptions and keywords extracted from SWISS-PROT and the ENZYME database. The Dictionary provides a powerful resource to validate, examine and visualize key structural and functional features of each homologous superfamily. The value of the DHS, for assessing functional variability and identifying distant evolutionary relationships, is illustrated using the pyridoxal-5'-phosphate (PLP) binding aspartate aminotransferase superfamily. The DHS also provides a tool for examining sequence-structure relationships for proteins within each fold group.
Collapse
Affiliation(s)
- J E Bray
- Biomolecular Structure and Modelling Unit, Department of Biochemistry and Molecular Biology, University College London, Gower Street,London WC1E 6BT, UK.
| | | | | | | | | |
Collapse
|
22
|
Chew LP, Huttenlocher D, Kedem K, Kleinberg J. Fast detection of common geometric substructure in proteins. J Comput Biol 1999; 6:313-25. [PMID: 10582569 DOI: 10.1089/106652799318292] [Citation(s) in RCA: 41] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
We consider the problem of identifying common three-dimensional substructures between proteins. Our method is based on comparing the shape of the alpha-carbon backbone structures of the proteins in order to find three-dimensional (3D) rigid motions that bring portions of the geometric structures into correspondence. We propose a geometric representation of protein backbone chains that is compact yet allows for similarity measures that are robust against noise and outliers. This representation encodes the structure of the backbone as a sequence of unit vectors, defined by each adjacent pair of alpha-carbons. We then define a measure of the similarity of two protein structures based on the root mean squared (RMS) distance between corresponding orientation vectors of the two proteins. Our measure has several advantages over measures that are commonly used for comparing protein shapes, such as the minimum RMS distance between the 3D positions of corresponding atoms in two proteins. A key advantage is that this new measure behaves well for identifying common substructures, in contrast with position-based measures where the nonmatching portions of the structure dominate the measure. At the same time, it avoids the quadratic space and computational difficulties associated with methods based on distance matrices and contact maps. We show applications of our approach to detecting common contiguous substructures in pairs of proteins, as well as the more difficult problem of identifying common protein domains (i.e., larger substructures that are not necessarily contiguous along the protein chain).
Collapse
Affiliation(s)
- L P Chew
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | | | | | |
Collapse
|
23
|
Abstract
CORA is a suite of programs for multiply aligning and analyzing protein structural families to identify the consensus positions and capture their most conserved structural characteristics (e.g., residue accessibility, torsional angles, and global geometry as described by inter-residue vectors/contacts). Knowledge of these structurally conserved positions, which are mostly in the core of the fold and of their properties, significantly improves the identification and classification of newly-determined relatives. Information is encoded in a consensus three-dimensional (3D) template and relatives found by a sensitive alignment method, which employs a new scoring scheme based on conserved residue contacts. By encapsulating these critical "core" features, templates perform more reliably in recognizing distant structural relatives than searches with representative structures. Parameters for 3D-template generation and alignment were optimized for each structural class (mainly-alpha, mainly-beta, alpha-beta), using representative superfold families. For all families selected, the templates gave significant improvements in sensitivity and selectivity in recognizing distant structural relatives. Furthermore, since templates contain less than 70% of fold positions and compare fewer positions when aligning structures, scans are at least an order of magnitude faster than scans using selected structures. CORA was subsequently tested on eight other broad structural families from the CATH database. Diagnostics plots are generated automatically and provide qualitative assistance for classifying newly determined relatives. They are demonstrated here by application to the large globin-like fold family. CORA templates for both homologous superfamilies and fold families will be stored in CATH and used to improve the classification and analysis of newly determined structures.
Collapse
Affiliation(s)
- C A Orengo
- Department of Biochemistry and Molecular Biology, University College, London, United Kingdom.
| |
Collapse
|
24
|
|
25
|
Orengo CA, Michie AD, Jones S, Jones DT, Swindells MB, Thornton JM. CATH--a hierarchic classification of protein domain structures. Structure 1997; 5:1093-108. [PMID: 9309224 DOI: 10.1016/s0969-2126(97)00260-8] [Citation(s) in RCA: 1652] [Impact Index Per Article: 61.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
BACKGROUND Protein evolution gives rise to families of structurally related proteins, within which sequence identities can be extremely low. As a result, structure-based classifications can be effective at identifying unanticipated relationships in known structures and in optimal cases function can also be assigned. The ever increasing number of known protein structures is too large to classify all proteins manually, therefore, automatic methods are needed for fast evaluation of protein structures. RESULTS We present a semi-automatic procedure for deriving a novel hierarchical classification of protein domain structures (CATH). The four main levels of our classification are protein class (C), architecture (A), topology (T) and homologous superfamily (H). Class is the simplest level, and it essentially describes the secondary structure composition of each domain. In contrast, architecture summarises the shape revealed by the orientations of the secondary structure units, such as barrels and sandwiches. At the topology level, sequential connectivity is considered, such that members of the same architecture might have quite different topologies. When structures belonging to the same T-level have suitably high similarities combined with similar functions, the proteins are assumed to be evolutionarily related and put into the same homologous superfamily. CONCLUSIONS Analysis of the structural families generated by CATH reveals the prominent features of protein structure space. We find that nearly a third of the homologous superfamilies (H-levels) belong to ten major T-levels, which we call superfolds, and furthermore that nearly two-thirds of these H-levels cluster into nine simple architectures. A database of well-characterised protein structure families, such as CATH, will facilitate the assignment of structure-function/evolution relationships to both known and newly determined protein structures.
Collapse
Affiliation(s)
- C A Orengo
- Department of Biochemistry and Molecular Biology, University College London, UK
| | | | | | | | | | | |
Collapse
|