1
|
Pal D, Dey S, Ghosh P, Bhattacharya DK, Das S, Maji B. A unique approach for protein secondary structure comparison under TOPS representation. J Biomol Struct Dyn 2024:1-13. [PMID: 38698728 DOI: 10.1080/07391102.2024.2333449] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 03/15/2024] [Indexed: 05/05/2024]
Abstract
To unravel the intricate connection between protein function and protein structure, it is imperative to comprehensively evaluate protein secondary structure similarity from various perspectives. While numerous techniques have been suggested for comparing protein secondary structure elements (SSE), there continues to be a substantial need for finding alternative ways of comparing the same. In this paper, Topology of Protein Structure (TOPS) representations of protein secondary structures are considered to offer a new alignment-free method for evaluating similarities/dissimilarities of protein secondary structures. Initially, a two-dimensional numerical representation of the SSE is created, associating each point with a mass reflecting its frequency of occurrence. Then the means of coordinate values are determined by averaging weighted sums, and these mean values are subsequently used to calculate moments-of-inertia. Next, a four-component descriptor is generated out of the eigenvalues of the matrix and the mean values of the represented coordinates. Thereafter, Manhattan distance measure is used to obtain the distance matrix. This is finally applied to obtain the phylogenetic trees under the use of NJ method. SSE considered in the proposed method comprises 36-elements from the Chew-Kedem database giving five different taxa: globin, alpha-beta, tim-barrel, beta, and alpha. Phylogenetic trees were created for these SSE through the application of various methods: Clustal-Omega, LZ-Complexity, SED, TOPS + and TOC, to facilitate comparative analysis. Phylogenetic tree of the proposed method outperformed results of the previous methods when applied to the same SSE. Therefore, the method effectively constructs phylogenetic tree for analyzing protein secondary structure comparison.
Collapse
Affiliation(s)
- Debrupa Pal
- Computer Application, Narula Institute of Technology, Kolkata, India
- Electronics and Communication Engineering, National Institute of Technology, Durgapur, India
| | - Sudeshna Dey
- Computer Science and Engineering, Narula Institute of Technology, Kolkata, India
| | - Papri Ghosh
- Computer Science and Engineering, Narula Institute of Technology, Kolkata, India
| | | | - Subhram Das
- Computer Science and Engineering, Narula Institute of Technology, Kolkata, India
| | - Bansibadan Maji
- Electronics and Communication Engineering, National Institute of Technology, Durgapur, India
| |
Collapse
|
2
|
Minami S, Kobayashi N, Sugiki T, Nagashima T, Fujiwara T, Tatsumi-Koga R, Chikenji G, Koga N. Exploration of novel αβ-protein folds through de novo design. Nat Struct Mol Biol 2023; 30:1132-1140. [PMID: 37400653 PMCID: PMC10442233 DOI: 10.1038/s41594-023-01029-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2021] [Accepted: 05/30/2023] [Indexed: 07/05/2023]
Abstract
A fundamental question in protein evolution is whether nature has exhaustively sampled nearly all possible protein folds throughout evolution, or whether a large fraction of the possible folds remains unexplored. To address this question, we defined a set of rules for β-sheet topology to predict novel αβ-folds and carried out a systematic de novo protein design exploration of the novel αβ-folds predicted by the rules. The designs for all eight of the predicted novel αβ-folds with a four-stranded β-sheet, including a knot-forming one, folded into structures close to the design models. Further, the rules predicted more than 10,000 novel αβ-folds with five- to eight-stranded β-sheets; this number far exceeds the number of αβ-folds observed in nature so far. This result suggests that a vast number of αβ-folds are possible, but have not emerged or have become extinct due to evolutionary bias.
Collapse
Affiliation(s)
- Shintaro Minami
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan
| | - Naohiro Kobayashi
- Institute for Protein Research (IPR), Osaka University, Osaka, Japan
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | - Toshihiko Sugiki
- Institute for Protein Research (IPR), Osaka University, Osaka, Japan
| | - Toshio Nagashima
- RIKEN Center for Biosystems Dynamics Research, RIKEN, Yokohama, Japan
| | | | - Rie Tatsumi-Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan
| | - George Chikenji
- Department of Applied Physics, Graduate School of Engineering, Nagoya University, Nagoya, Japan
| | - Nobuyasu Koga
- Protein Design Group, Exploratory Research Center on Life and Living Systems (ExCELLS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
- SOKENDAI, The Graduate University for Advanced Studies, Hayama, Japan.
- Research Center of Integrative Molecular Systems, Institute for Molecular Science (IMS), National Institutes of Natural Sciences (NINS), Okazaki, Japan.
- Laboratory for Protein Design, Institute for Protein Research (IPR), Osaka University, Osaka, Japan.
| |
Collapse
|
3
|
Purutçuoğlu V, Ağraz M, Wit E. Bernstein approximations in glasso-based estimation of biological networks. CAN J STAT 2017. [DOI: 10.1002/cjs.11309] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
| | - Melih Ağraz
- Middle East Technical University; Ankara Turkey
| | - Ernst Wit
- University of Groningen; Groningen The Netherlands
| |
Collapse
|
4
|
Maghawry HA, Mostafa MGM, Gharib TF. A new protein structure representation for efficient protein function prediction. J Comput Biol 2015; 21:936-46. [PMID: 25343279 DOI: 10.1089/cmb.2014.0137] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
One of the challenging problems in bioinformatics is the prediction of protein function. Protein function is the main key that can be used to classify different proteins. Protein function can be inferred experimentally with very small throughput or computationally with very high throughput. Computational methods are sequence based or structure based. Structure-based methods produce more accurate protein function prediction. In this article, we propose a new protein structure representation for efficient protein function prediction. The representation is based on three-dimensional patterns of protein residues. In the analysis, we used protein function based on enzyme activity through six mechanistically diverse enzyme superfamilies: amidohydrolase, crotonase, haloacid dehalogenase, isoprenoid synthase type I, and vicinal oxygen chelate. We applied three different classification methods, naïve Bayes, k-nearest neighbors, and random forest, to predict the enzyme superfamily of a given protein. The prediction accuracy using the proposed representation outperforms a recently introduced representation method that is based only on the distance patterns. The results show that the proposed representation achieved prediction accuracy up to 98%, with improvement of about 10% on average.
Collapse
Affiliation(s)
- Huda A Maghawry
- 1 Department of Information Systems, Faculty of Computer and Information Sciences, Ain Shams University , Cairo, Egypt
| | | | | |
Collapse
|
5
|
Abstract
Measuring protein structural similarity attempts to establish a relationship of equivalence between polymer structures based on their conformations. In several recent studies, researchers have explored protein-graph remodeling, instead of looking a minimum superimposition for pairwise proteins. When graphs are used to represent structured objects, the problem of measuring object similarity become one of computing the similarity between graphs. Graph theory provides an alternative perspective as well as efficiency. Once a protein graph has been created, its structural stability must be verified. Therefore, a criterion is needed to determine if a protein graph can be used for structural comparison. In this paper, we propose a measurement for protein graph remodeling based on graph entropy. We extend the concept of graph entropy to determine whether a graph is suitable for representing a protein. The experimental results suggest that when applied, graph entropy helps a conformational on protein graph modeling. Furthermore, it indirectly contributes to protein structural comparison if a protein graph is solid.
Collapse
|
6
|
Nemoto W, Saito A, Oikawa H. Recent advances in functional region prediction by using structural and evolutionary information - Remaining problems and future extensions. Comput Struct Biotechnol J 2013; 8:e201308007. [PMID: 24688747 PMCID: PMC3962155 DOI: 10.5936/csbj.201308007] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2013] [Revised: 11/12/2013] [Accepted: 11/13/2013] [Indexed: 11/22/2022] Open
Abstract
Structural genomics projects have solved many new structures with unknown functions. One strategy to investigate the function of a structure is to computationally find the functionally important residues or regions on it. Therefore, the development of functional region prediction methods has become an important research subject. An effective approach is to use a method employing structural and evolutionary information, such as the evolutionary trace (ET) method. ET ranks the residues of a protein structure by calculating the scores for relative evolutionary importance, and locates functionally important sites by identifying spatial clusters of highly ranked residues. After ET was developed, numerous ET-like methods were subsequently reported, and many of them are in practical use, although they require certain conditions. In this mini review, we first introduce the remaining problems and the recent improvements in the methods using structural and evolutionary information. We then summarize the recent developments of the methods. Finally, we conclude by describing possible extensions of the evolution- and structure-based methods.
Collapse
Affiliation(s)
- Wataru Nemoto
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Akira Saito
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| | - Hayato Oikawa
- Division of Life Science and Engineering, School of Science and Engineering, Tokyo Denki University (TDU), Ishizaka, Hatoyama-cho, Hiki-gun, Saitama, 350-0394, Japan
| |
Collapse
|
7
|
Liu HL, Lin JC, Ho Y, Hsieh WC, Chen CW, Su YC. Homology Models and Molecular Dynamics Simulations of Main Proteinase from Coronavirus Associated with Severe Acute Respiratory Syndrome (SARS). J CHIN CHEM SOC-TAIP 2013; 51:889-900. [PMID: 32336761 PMCID: PMC7167048 DOI: 10.1002/jccs.200400134] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2004] [Indexed: 11/29/2022]
Abstract
In this study, two structural models (denoted as MproST and MproSH) of the main proteinase (Mpro) from the novel coronavirus associated with severe acute respiratory syndrome (SARS‐CoV) were constructed based on the crystallographic structures of Mpro from transmissible gastroenteritis coronavirus (TGEV) (MproT) and human coronavirus HcoV‐229E (MproH), respectively. Various 200 ps molecular dynamics simulations were subsequently performed to investigate the dynamics behaviors of several structural features. Both MproST and MproSH exhibit similar folds as their respective template proteins. These structural models reveal three distinct functional domains as well as an intervening loop connecting domains II and III as found in both template proteins. In addition, domain III of these structures exhibits the least secondary structural conservation. A catalytic cleft containing the substrate binding subsites S1 and the S2 between domains I and II are also observed in these structural models. Although these structures share many common features, the most significant difference occurs at the S2 subsite, where the amino acid residues lining up this subsite are least conserved. It may be a critical challenge for designing anti‐SARS drugs by simply screening the known database of proteinase inhibitors.
Collapse
Affiliation(s)
- Hsuan-Liang Liu
- Department of Chemical Engineering and Graduate Institute of Biotechnology, National Taipei University of Technology, Taipei 10608, Taiwan, R.O.C
| | - Jin-Chung Lin
- Department of Chemical Engineering and Graduate Institute of Biotechnology, National Taipei University of Technology, Taipei 10608, Taiwan, R.O.C
| | - Yih Ho
- School of Pharmacy, Taipei Medical University, Taipei 110, Taiwan, R.O.C
| | - Wei-Chan Hsieh
- Department of Chemical Engineering and Graduate Institute of Biotechnology, National Taipei University of Technology, Taipei 10608, Taiwan, R.O.C
| | - Chin-Wen Chen
- Department of Chemical Engineering and Graduate Institute of Biotechnology, National Taipei University of Technology, Taipei 10608, Taiwan, R.O.C
| | - Yuan-Chen Su
- Department of Chemical Engineering and Graduate Institute of Biotechnology, National Taipei University of Technology, Taipei 10608, Taiwan, R.O.C
| |
Collapse
|
8
|
Searls DB. A primer in macromolecular linguistics. Biopolymers 2012; 99:203-17. [PMID: 23034580 DOI: 10.1002/bip.22101] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2012] [Accepted: 05/25/2012] [Indexed: 01/01/2023]
Abstract
Polymeric macromolecules, when viewed abstractly as strings of symbols, can be treated in terms of formal language theory, providing a mathematical foundation for characterizing such strings both as collections and in terms of their individual structures. In addition this approach offers a framework for analysis of macromolecules by tools and conventions widely used in computational linguistics. This article introduces the ways that linguistics can be and has been applied to molecular biology, covering the relevant formal language theory at a relatively nontechnical level. Analogies between macromolecules and human natural language are used to provide intuitive insights into the relevance of grammars, parsing, and analysis of language complexity to biology.
Collapse
|
9
|
Wang J, Gao X, Wang Q, Li Y. ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval. BMC Bioinformatics 2012; 13 Suppl 7:S2. [PMID: 22594999 PMCID: PMC3348016 DOI: 10.1186/1471-2105-13-s7-s2] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. RESULTS In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure dij by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context N(i) and N(j) is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing dij by a factor learned from the context N(i) and N(j).Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC.We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information. CONCLUSIONS Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.
Collapse
Affiliation(s)
- Jingyan Wang
- King Abdullah University of Science and Technology (KAUST), Mathematical and Computer Sciences and Engineering Division, Thuwal, 23955-6900, Saudi Arabia
| | | | | | | |
Collapse
|
10
|
Stegemann B, Klebe G. Cofactor-binding sites in proteins of deviating sequence: comparative analysis and clustering in torsion angle, cavity, and fold space. Proteins 2011; 80:626-48. [PMID: 22095739 DOI: 10.1002/prot.23226] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 09/29/2011] [Accepted: 10/10/2011] [Indexed: 12/13/2022]
Abstract
Small molecules are recognized in protein-binding pockets through surface-exposed physicochemical properties. To optimize binding, they have to adopt a conformation corresponding to a local energy minimum within the formed protein-ligand complex. However, their conformational flexibility makes them competent to bind not only to homologous proteins of the same family but also to proteins of remote similarity with respect to the shape of the binding pockets and folding pattern. Considering drug action, such observations can give rise to unexpected and undesired cross reactivity. In this study, datasets of six different cofactors (ADP, ATP, NAD(P)(H), FAD, and acetyl CoA, sharing an adenosine diphosphate moiety as common substructure), observed in multiple crystal structures of protein-cofactor complexes exhibiting sequence identity below 25%, have been analyzed for the conformational properties of the bound ligands, the distribution of physicochemical properties in the accommodating protein-binding pockets, and the local folding patterns next to the cofactor-binding site. State-of-the-art clustering techniques have been applied to group the different protein-cofactor complexes in the different spaces. Interestingly, clustering in cavity (Cavbase) and fold space (DALI) reveals virtually the same data structuring. Remarkable relationships can be found among the different spaces. They provide information on how conformations are conserved across the host proteins and which distinct local cavity and fold motifs recognize the different portions of the cofactors. In those cases, where different cofactors are found to be accommodated in a similar fashion to the same fold motifs, only a commonly shared substructure of the cofactors is used for the recognition process.
Collapse
Affiliation(s)
- Björn Stegemann
- Institut für Pharmazeutische Chemie, Philipps-Universität Marburg, Marbacher Weg 6, D-35032 Marburg, Germany
| | | |
Collapse
|
11
|
Hollup SM, Sadowski MI, Jonassen I, Taylor WR. Exploring the limits of fold discrimination by structural alignment: a large scale benchmark using decoys of known fold. Comput Biol Chem 2011; 35:174-88. [PMID: 21704264 PMCID: PMC3145973 DOI: 10.1016/j.compbiolchem.2011.04.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2011] [Accepted: 04/23/2011] [Indexed: 11/10/2022]
Abstract
Protein structure comparison by pairwise alignment is commonly used to identify highly similar substructures in pairs of proteins and provide a measure of structural similarity based on the size and geometric similarity of the match. These scores are routinely applied in analyses of protein fold space under the assumption that high statistical significance is equivalent to a meaningful relationship, however the truth of this assumption has previously been difficult to test since there is a lack of automated methods which do not rely on the same underlying principles. As a resolution to this we present a method based on the use of topological descriptions of global protein structure, providing an independent means to assess the ability of structural alignment to maintain meaningful structural correspondances on a large scale. Using a large set of decoys of specified global fold we benchmark three widely used methods for structure comparison, SAP, TM-align and DALI, and test the degree to which this assumption is justified for these methods. Application of a topological edit distance measure to provide a scale of the degree of fold change shows that while there is a broad correlation between high structural alignment scores and low edit distances there remain many pairs of highly significant score which differ by core strand swaps and therefore are structurally different on a global level. Possible causes of this problem and its meaning for present assessments of protein fold space are discussed.
Collapse
|
12
|
Rocha J. Graph comparison by log-odds score matrices with application to protein topology analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:564-569. [PMID: 21233531 DOI: 10.1109/tcbb.2010.59] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
A TOPS diagram is a simplified description of the topology of a protein using a graph where nodes are alpha-helices and beta-strands, and edges correspond to chirality relations and parallel or antiparallel bonds between strands. We present a matching algorithm between two TOPS diagrams where the likelihood of a match is measured according to previously known matches between complete 3D structures. This totally new 3D training is recorded on transition matrices that count the likelihood that a given TOPS feature, or combination thereof, is replaced by another feature on homologs. The new algorithm outperforms existing ones on a benchmark database. Some biologically significant examples are discussed as well. The method can be used whenever frequencies of edge relationship matches are known, as it is the case for several biopolymer structures.
Collapse
Affiliation(s)
- J Rocha
- Department of Mathematics and Computer Science, University of the Balearic Islands, Palma de Mallorca, 07122 Spain.
| |
Collapse
|
13
|
Stivala AD, Stuckey PJ, Wirth AI. Fast and accurate protein substructure searching with simulated annealing and GPUs. BMC Bioinformatics 2010; 11:446. [PMID: 20813068 PMCID: PMC2944279 DOI: 10.1186/1471-2105-11-446] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Accepted: 09/03/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching. RESULTS We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). CONCLUSIONS The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.
Collapse
Affiliation(s)
- Alex D Stivala
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
| | - Peter J Stuckey
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
- National ICT Australia Victoria Laboratory at The University of Melbourne, Victoria 3010, Australia
| | - Anthony I Wirth
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
| |
Collapse
|
14
|
Zhang ZH, Lee HK, Mihalek I. Reduced representation of protein structure: implications on efficiency and scope of detection of structural similarity. BMC Bioinformatics 2010; 11:155. [PMID: 20338066 PMCID: PMC3098053 DOI: 10.1186/1471-2105-11-155] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2009] [Accepted: 03/26/2010] [Indexed: 11/10/2022] Open
Abstract
Background Computational comparison of two protein structures is the starting point of many methods that build on existing knowledge, such as structure modeling (including modeling of protein complexes and conformational changes), molecular replacement, or annotation by structural similarity. In a commonly used strategy, significant effort is invested in matching two sets of atoms. In a complementary approach, a global descriptor is assigned to the overall structure, thus losing track of the substructures within. Results Using a small set of geometric features, we define a reduced representation of protein structure, together with an optimizing function for matching two representations, to provide a pre-filtering stage in a database search. We show that, in a straightforward implementation, the representation performs well in terms of resolution in the space of protein structures, and its ability to make new predictions. Conclusions Perhaps unexpectedly, a substantial discriminating power already exists at the level of main features of protein structure, such as directions of secondary structural elements, possibly constrained by their sequential order. This can be used toward efficient comparison of protein (sub)structures, allowing for various degrees of conformational flexibility within the compared pair, which in turn can be used for modeling by homology of protein structure and dynamics.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute, A*STAR, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | | | | |
Collapse
|
15
|
Stivala A, Wirth A, Stuckey PJ. Tableau-based protein substructure search using quadratic programming. BMC Bioinformatics 2009; 10:153. [PMID: 19450287 PMCID: PMC2705363 DOI: 10.1186/1471-2105-10-153] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 05/19/2009] [Indexed: 12/13/2022] Open
Abstract
Background Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database. Results We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques. Conclusion We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.
Collapse
Affiliation(s)
- Alex Stivala
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia.
| | | | | |
Collapse
|
16
|
Ivanciuc O, Schein CH, Garcia T, Oezguen N, Negi SS, Braun W. Structural analysis of linear and conformational epitopes of allergens. Regul Toxicol Pharmacol 2008; 54:S11-9. [PMID: 19121639 DOI: 10.1016/j.yrtph.2008.11.007] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2008] [Revised: 11/06/2008] [Accepted: 11/06/2008] [Indexed: 11/17/2022]
Abstract
In many countries regulatory agencies have adopted safety guidelines, based on bioinformatics rules from the WHO/FAO and EFSA recommendations, to prevent potentially allergenic novel foods or agricultural products from reaching consumers. We created the Structural Database of Allergenic Proteins (SDAP, http://fermi.utmb.edu/SDAP/) to combine data that had previously been available only as flat files on Web pages or in the literature. SDAP was designed to be user friendly, to be of maximum use to regulatory agencies, clinicians, as well as to scientists interested in assessing the potential allergenic risk of a protein. We developed methods, unique to SDAP, to compare the physicochemical properties of discrete areas of allergenic proteins to known IgE epitopes. We developed a new similarity measure, the property distance (PD) value that can be used to detect related segments in allergens with clinical observed cross-reactivity. We have now expanded this work to obtain experimental validation of the PD index as a quantitative predictor of IgE cross-reactivity, by designing peptide variants with predetermined PD scores relative to known IgE epitopes. In complementary work we show how sequence motifs characteristic of allergenic proteins in protein families can be used as fingerprints for allergenicity.
Collapse
Affiliation(s)
- Ovidiu Ivanciuc
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, Galveston, TX 77555-0857, USA
| | | | | | | | | | | |
Collapse
|
17
|
Zhou P, Shang Z. 2D molecular graphics: a flattened world of chemistry and biology. Brief Bioinform 2008; 10:247-58. [DOI: 10.1093/bib/bbp013] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
18
|
Veeramalai M, Gilbert D. A novel method for comparing topological models of protein structures enhanced with ligand information. Bioinformatics 2008; 24:2698-705. [DOI: 10.1093/bioinformatics/btn518] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
19
|
Veeramalai M, Ye Y, Godzik A. TOPS++FATCAT: fast flexible structural alignment using constraints derived from TOPS+ Strings Model. BMC Bioinformatics 2008; 9:358. [PMID: 18759993 PMCID: PMC2553092 DOI: 10.1186/1471-2105-9-358] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2008] [Accepted: 08/31/2008] [Indexed: 11/28/2022] Open
Abstract
Background Protein structure analysis and comparison are major challenges in structural bioinformatics. Despite the existence of many tools and algorithms, very few of them have managed to capture the intuitive understanding of protein structures developed in structural biology, especially in the context of rapid database searches. Such intuitions could help speed up similarity searches and make it easier to understand the results of such analyses. Results We developed a TOPS++FATCAT algorithm that uses an intuitive description of the proteins' structures as captured in the popular TOPS diagrams to limit the search space of the aligned fragment pairs (AFPs) in the flexible alignment of protein structures performed by the FATCAT algorithm. The TOPS++FATCAT algorithm is faster than FATCAT by more than an order of magnitude with a minimal cost in classification and alignment accuracy. For beta-rich proteins its accuracy is better than FATCAT, because the TOPS+ strings models contains important information of the parallel and anti-parallel hydrogen-bond patterns between the beta-strand SSEs (Secondary Structural Elements). We show that the TOPS++FATCAT errors, rare as they are, can be clearly linked to oversimplifications of the TOPS diagrams and can be corrected by the development of more precise secondary structure element definitions. Software Availability The benchmark analysis results and the compressed archive of the TOPS++FATCAT program for Linux platform can be downloaded from the following web site: Conclusion TOPS++FATCAT provides FATCAT accuracy and insights into protein structural changes at a speed comparable to sequence alignments, opening up a possibility of interactive protein structure similarity searches.
Collapse
Affiliation(s)
- Mallika Veeramalai
- Joint Center for Molecular Modeling, Burnham Institute for Medical Research, La Jolla, CA 92037, USA.
| | | | | |
Collapse
|
20
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
21
|
Kristensen DM, Ward RM, Lisewski AM, Erdin S, Chen BY, Fofanov VY, Kimmel M, Kavraki LE, Lichtarge O. Prediction of enzyme function based on 3D templates of evolutionarily important amino acids. BMC Bioinformatics 2008; 9:17. [PMID: 18190718 PMCID: PMC2219985 DOI: 10.1186/1471-2105-9-17] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2007] [Accepted: 01/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Structural genomics projects such as the Protein Structure Initiative (PSI) yield many new structures, but often these have no known molecular functions. One approach to recover this information is to use 3D templates - structure-function motifs that consist of a few functionally critical amino acids and may suggest functional similarity when geometrically matched to other structures. Since experimentally determined functional sites are not common enough to define 3D templates on a large scale, this work tests a computational strategy to select relevant residues for 3D templates. RESULTS Based on evolutionary information and heuristics, an Evolutionary Trace Annotation (ETA) pipeline built templates for 98 enzymes, half taken from the PSI, and sought matches in a non-redundant structure database. On average each template matched 2.7 distinct proteins, of which 2.0 share the first three Enzyme Commission digits as the template's enzyme of origin. In many cases (61%) a single most likely function could be predicted as the annotation with the most matches, and in these cases such a plurality vote identified the correct function with 87% accuracy. ETA was also found to be complementary to sequence homology-based annotations. When matches are required to both geometrically match the 3D template and to be sequence homologs found by BLAST or PSI-BLAST, the annotation accuracy is greater than either method alone, especially in the region of lower sequence identity where homology-based annotations are least reliable. CONCLUSION These data suggest that knowledge of evolutionarily important residues improves functional annotation among distant enzyme homologs. Since, unlike other 3D template approaches, the ETA method bypasses the need for experimental knowledge of the catalytic mechanism, it should prove a useful, large scale, and general adjunct to combine with other methods to decipher protein function in the structural proteome.
Collapse
Affiliation(s)
- David M Kristensen
- Department of Molecular and Human Genetics, Biophysics, Baylor College of Medicine, Houston, TX 77030, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
22
|
Abstract
As protein databases continue to grow in size, exhaustive search methods that compare a query structure against every database structure can no longer provide satisfactory performance. Instead, the filter-and-refine paradigm offers an efficient alternative to database search without compromising the accuracy of the answers. In this paradigm, protein structures are represented in an abstract form. During querying, based on the abstract representations, the filtering phase prunes away dissimilar structures quickly so that only a small collection of promising structures are examined using a detailed structure alignment technique in the refinement phase. This article reviews mainly techniques developed for the filtering phase.
Collapse
Affiliation(s)
- Zeyar Aung
- Institute for Infocomm Research, 21 Heng Mui Keng Terrace, Singapore 119613, Singapore.
| | | |
Collapse
|
23
|
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment. BMC Bioinformatics 2007; 8:252. [PMID: 17629909 PMCID: PMC1939857 DOI: 10.1186/1471-2105-8-252] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2007] [Accepted: 07/13/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. RESULTS We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. CONCLUSION UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.
Collapse
|
24
|
Schein CH, Ivanciuc O, Braun W. Bioinformatics approaches to classifying allergens and predicting cross-reactivity. Immunol Allergy Clin North Am 2007; 27:1-27. [PMID: 17276876 PMCID: PMC1941676 DOI: 10.1016/j.iac.2006.11.005] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
Allergenic proteins from very different environmental sources have similar sequences and structures. This fact may account for multiple allergen syndromes, whereby a myriad of diverse plants and foods may induce a similar IgE-based reaction in certain patients. Identifying the common triggering protein in these sources, in silico, can aid designing individualized therapy for allergen sufferers. This article provides an overview of databases on allergenic proteins, and ways to identify common proteins that may be the cause of multiple allergy syndromes. The major emphasis is on the relational Structural Database of Allergenic Proteins (SDAP []), which includes cross-referenced data on the sequence, structure, and IgE epitopes of over 800 allergenic proteins, coupled with specially developed bioinformatics tools to group all allergens and identify discrete areas that may account for cross-reactivity. SDAP is freely available on the Web to clinicians and patients.
Collapse
Affiliation(s)
- Catherine H. Schein
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Microbiology and Immunology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| | - Ovidiu Ivanciuc
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| | - Werner Braun
- Sealy Center for Structural Biology and Molecular Biophysics, Departments of Biochemistry and Molecular Biology, University of Texas Medical Branch, 301 University Blvd., Galveston TX 77555-0857
| |
Collapse
|
25
|
Wu Z, Wang Y, Feng E, Chen L. A new geometric-topological method to measure protein fold similarity. Chem Phys Lett 2007. [DOI: 10.1016/j.cplett.2006.11.071] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
26
|
Lisewski AM, Lichtarge O. Rapid detection of similarity in protein structure and function through contact metric distances. Nucleic Acids Res 2006; 34:e152. [PMID: 17130161 PMCID: PMC1702494 DOI: 10.1093/nar/gkl788] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The characterization of biological function among newly determined protein structures is a central challenge in structural genomics. One class of computational solutions to this problem is based on the similarity of protein structure. Here, we implement a simple yet efficient measure of protein structure similarity, the contact metric. Even though its computation avoids structural alignments and is therefore nearly instantaneous, we find that small values correlate with geometrical root mean square deviations obtained from structural alignments. To test whether the contact metric detects functional similarity, as defined by Gene Ontology (GO) terms, it was compared in large-scale computational experiments to four other measures of structural similarity, including alignment algorithms as well as alignment independent approaches. The contact metric was the fastest method and its sensitivity, at any given specificity level, was a close second only to Fast Alignment and Search Tool—a structural alignment method that is slower by three orders of magnitude. Critically, nearly 40% of correct functional inferences by the contact metric were not identified by any other approach, which shows that the contact metric is complementary and computationally efficient in detecting functional relationships between proteins. A public ‘Contact Metric Internet Server’ is provided.
Collapse
Affiliation(s)
| | - Olivier Lichtarge
- To whom correspondence should be addressed. Tel: +1 713 798 5646; Fax: +1 713 798 7773;
| |
Collapse
|
27
|
Connectivity independent protein-structure alignment: a hierarchical approach. BMC Bioinformatics 2006; 7:510. [PMID: 17118190 PMCID: PMC1683948 DOI: 10.1186/1471-2105-7-510] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2006] [Accepted: 11/21/2006] [Indexed: 11/13/2022] Open
Abstract
Background Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure elements (SSE) but differ in the underlying polypeptide chain connectivity (non-sequential SSE connectivity). Results We perform protein-structure alignment using a two-level hierarchical approach implemented in the program GANGSTA. On the first level, pair contacts and relative orientations between SSEs (i.e. α-helices and β-strands) are maximized with a genetic algorithm (GA). On the second level residue pair contacts from the best SSE alignments are optimized. We have tested the method on visually optimized structure alignments of protein pairs (pairwise mode) and for database scans. For a given protein structure, our method is able to detect significant structural similarity of functionally important folds with non-sequential SSE connectivity. The performance for structure alignments with strictly sequential SSE connectivity is comparable to that of other structure alignment methods. Conclusion As demonstrated for several applications, GANGSTA finds meaningful protein-structure alignments independent of the SSE connectivity. GANGSTA is able to detect structural similarity of protein folds that are assigned to different superfamilies but nevertheless possess similar structures and perform related functions, even if these proteins differ in SSE connectivity.
Collapse
|
28
|
Abstract
PAST is a new web service providing fast structural queries of the Protein Data Bank. The search engine is based on an adaptation of the generalized suffix tree and relies on a translation- and rotation-invariant representation of the protein backbone. The search procedure is completely independent of the amino acid sequence of the polypeptide chains. The web service works best with, but is not necessarily limited to, shorter fragments such as functional motifs—a task that most other tools do not perform well. Usual query times are in the order of seconds, allowing a truly interactive use. Unlike most established tools, PAST does not prefilter the dataset or exclude parts of the search space based on statistical reasoning. The server is freely available at .
Collapse
Affiliation(s)
| | - Arno Buchner
- Correspondence may also be addressed to Arno Buchner.
| | | |
Collapse
|
29
|
Huet A, Derreumaux P. Impact of the mutation A21G (Flemish variant) on Alzheimer's beta-amyloid dimers by molecular dynamics simulations. Biophys J 2006; 91:3829-40. [PMID: 16891372 PMCID: PMC1630479 DOI: 10.1529/biophysj.106.090993] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Soluble oligomers of the amyloid beta-protein (Abeta) are linked to Alzheimer's disease. Irrespective of the nature of the nucleus before fibril growth, dimers are essential species in Abeta assembly, but their transient character has precluded, thus far, high-resolution structure determination. We have investigated the effects of the point mutation A21G on Abeta dimers by performing high temperature all-atom molecular dynamics simulations of Abeta(40), Abeta(42), and their Flemish variants (A21G) starting from their fibrillar conformations. Abeta dimers are found in equilibrium between various topologies, and the absence of common structural features shared by the four species makes problematic the design of a unique inhibitor for blocking dimers. We also show that the impact of the point mutation A21G on Abeta structure and dynamics varies from Abeta(40) to Abeta(42). Finally, we provide a possible structural explanation for the reduced aggregation rate of Abeta fibrils containing the Flemish disease-causing mutation.
Collapse
Affiliation(s)
- Alexis Huet
- Laboratoire de Biochimie Théorique, UPR 9080, Centre National de la Recherche Scientifique, Institut de Biologie Physico-Chimique, et Université Paris, Paris, France
| | | |
Collapse
|
30
|
Abstract
A novel protein structure alignment technique has been developed reducing much of the secondary and tertiary structure to a sequential representation greatly accelerating many structural computations, including alignment. Constructed from incidence relations in the Delaunay tetrahedralization, alignments of the sequential representation describe structural similarities that cannot be expressed with rigid-body superposition and complement existing techniques minimizing root-mean-squared distance through superposition. Restricting to the largest substructure superimposable by a single rigid-body transformation determines an alignment suitable for root-mean-squared distance comparisons and visualization. Restricted alignments of a test set of histones and histone-like proteins determined superpositions nearly identical to those produced by the established structure alignment routines of DaliLite and ProSup. Alignment of three, increasingly complex proteins: ferredoxin, cytidine deaminase, and carbamoyl phosphate synthetase, to themselves, demonstrated previously identified regions of self-similarity. All-against-all similarity index comparisons performed on a test set of 45 class I and class II aminoacyl-tRNA synthetases closely reproduced the results of established distance matrix methods while requiring 1/16 the time. Principal component analysis of pairwise tetrahedral decomposition similarity of 2300 molecular dynamics snapshots of tryptophanyl-tRNA synthetase revealed discrete microstates within the trajectory consistent with experimental results. The method produces results with sufficient efficiency for large-scale multiple structure alignment and is well suited to genomic and evolutionary investigations where no geometric model of similarity is known a priori.
Collapse
Affiliation(s)
- Jeffrey Roach
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
| | | | | | | |
Collapse
|
31
|
Abstract
This article describes the development of a new method for multiple sequence alignment based on fold-level protein structure alignments, which provides an improvement in accuracy compared with the most commonly used sequence-only-based techniques. This method integrates the widely used, progressive multiple sequence alignment approach ClustalW with the Topology of Protein Structure (TOPS) topology-based alignment algorithm. The TOPS approach produces a structural alignment for the input protein set by using a topology-based pattern discovery program, providing a set of matched sequence regions that can be used to guide a sequence alignment using ClustalW. The resulting alignments are more reliable than a sequence-only alignment, as determined by 20-fold cross-validation with a set of 106 protein examples from the CATH database, distributed in seven superfold families. The method is particularly effective for sets of proteins that have similar structures at the fold level but low sequence identity. The aim of this research is to contribute towards bridging the gap between protein sequence and structure analysis, in the hope that this can be used to assist the understanding of the relationship between sequence, structure and function. The tool is available at http://balabio.dcs.gla.ac.uk/msat/.
Collapse
Affiliation(s)
- Te Ren
- Department of Computer Science, Bioinformatics Research Centre, University of Glasgow, Glasgow G12 8QQ, Scotland, UK
| | | | | | | |
Collapse
|
32
|
Park SH, Ryu KH, Gilbert D. Fast similarity search for protein 3D structures using topological pattern matching based on spatial relations. Int J Neural Syst 2005; 15:287-96. [PMID: 16187404 DOI: 10.1142/s0129065705000244] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Similarity search for protein 3D structures become complex and computationally expensive due to the fact that the size of protein structure databases continues to grow tremendously. Recently, fast structural similarity search systems have been required to put them into practical use in protein structure classification whilst existing comparison systems do not provide comparison results on time. Our approach uses multi-step processing that composes of a preprocessing step to represent geometry of protein structures with spatial objects, a filter step to generate a small candidate set using approximate topological string matching, and a refinement step to compute a structural alignment. This paper describes the preprocessing and filtering for fast similarity search using the discovery of topological patterns of secondary structure elements based on spatial relations. Our system is fully implemented by using Oracle 8i spatial. We have previously shown that our approach has the advantage of speed of performance compared with other approach such as DALI. This work shows that the discovery of topological relations of secondary structure elements in protein structures by using spatial relations of spatial databases is practical for fast structural similarity search for proteins.
Collapse
Affiliation(s)
- Sung-Hee Park
- Database Bioinformatics Laboratory, School of Electrical & Computer Engineering, Chungbuk National University, Cheongju, 361-763, Korea.
| | | | | |
Collapse
|
33
|
Abstract
YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure. It searches for the longest common substructures called SHSPs (structural high-scoring pairs) existing between a query structure and every structure in the structural database. It makes use of protein backbone internal coordinates (alpha angles) in order to describe protein structures as sequences of symbols. The structural similarities are established in 5 steps, the first 3 being analogous to those used in BLAST: (1) building up a deterministic finite automaton describing all patterns identical or similar to those in the query structure; (2) searching for all these patterns in every structure in the database; (3) extending the patterns to longer matching substructures (i.e., SHSPs); (4) selecting compatible SHSPs for each query-database structure pair; and (5) ranking the query-database structure pairs using 3 scores based on SHSP similarity, on SHSP probabilities, and on spatial compatibility of SHSPs. Structural fragment probabilities are estimated according to a mixture transition distribution model, which is an approximation of a high-order Markov chain model. With regard to sensitivity and selectivity of the structural matches, YAKUSA compares well to the best related programs, although it is by far faster: A typical database scan takes about 40 s CPU time on a desktop personal computer. It has also been implemented on a Web server for real-time searches.
Collapse
|
34
|
Kumaran D, Eswaramoorthy S, Studier FW, Swaminathan S. Structure and mechanism of ADP-ribose-1''-monophosphatase (Appr-1''-pase), a ubiquitous cellular processing enzyme. Protein Sci 2005; 14:719-26. [PMID: 15722447 PMCID: PMC2279289 DOI: 10.1110/ps.041132005] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Appr-1''-pase, an important and ubiquitous cellular processing enzyme involved in the tRNA splicing pathway, catalyzes the conversion of ADP-ribose-1''monophosphate (Appr-1''-p) to ADP-ribose. The structures of the native enzyme from the yeast and its complex with ADP-ribose were determined to 1.9 A and 2.05 A, respectively. Analysis of the three-dimensional structure of this protein, selected as a target in a structural genomics project, reveals its putative function and provides clues to the catalytic mechanism. The structure of the 284-amino acid protein shows a two-domain architecture consisting of a three-layer alphabetaalpha sandwich N-terminal domain connected to a small C-terminal helical domain. The structure of Appr-1''-pase in complex with the product, ADP-ribose, reveals an active-site water molecule poised for nucleophilic attack on the terminal phosphate group. Loop-region residues Asn 80, Asp 90, and His 145 may form a catalytic triad.
Collapse
Affiliation(s)
- Desigan Kumaran
- Biology Department, Brookhaven National Laboratory, Upton, NY 11973, USA
| | | | | | | |
Collapse
|
35
|
Sierk ML, Kleywegt GJ. Déjà vu all over again: finding and analyzing protein structure similarities. Structure 2005; 12:2103-11. [PMID: 15576025 DOI: 10.1016/j.str.2004.09.016] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2004] [Revised: 09/07/2004] [Accepted: 09/23/2004] [Indexed: 10/26/2022]
Abstract
Structure comparison is a crucial aspect of structural biology today. The field of structure comparison is developing rapidly, with the development of new algorithms, similarity scores, and statistical scores. The predicted large increase of experimental structures and structural models made possible by high-throughput efforts means that structural comparison and searching of structural databases using automated methods will become increasingly common. This Ways & Means article is meant to guide the structural biologist in the basics of structural alignment, and to provide an overview of the available software tools. The main purpose is to encourage users to gain some understanding of the strengths and limitations of structural alignment, and to take these factors into account when interpreting the results of different programs.
Collapse
Affiliation(s)
- Michael L Sierk
- Department of Biochemistry and Molecular Genetics, University of Virginia, P.O. Box 800733, Charlottesville, VA 22908, USA.
| | | |
Collapse
|
36
|
Torrance GM, Gilbert DR, Michalopoulos I, Westhead DW. Protein structure topological comparison, discovery and matching service. Bioinformatics 2005; 21:2537-8. [PMID: 15741246 DOI: 10.1093/bioinformatics/bti331] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED We describe a fold level fast protein comparison and motif matching facility based on the TOPS representation of structure. This provides an update to a previous service at the EBI, with a better graph matching with faster results and visualization of both the structures being compared against and the common pattern of each with the target domain. AVAILABILITY Web service at http://balabio.dcs.gla.ac.uk/tops or via the main TOPS site at http://www.tops.leeds.ac.uk. Software is also available for download from these sites.
Collapse
Affiliation(s)
- G M Torrance
- Bioinformatics Research Centre, Department of Computer Science, University of Glasgow, Glasgow G12 8QQ, UK
| | | | | | | |
Collapse
|
37
|
Klein DJ, Moore PB, Steitz TA. The roles of ribosomal proteins in the structure assembly, and evolution of the large ribosomal subunit. J Mol Biol 2004; 340:141-77. [PMID: 15184028 DOI: 10.1016/j.jmb.2004.03.076] [Citation(s) in RCA: 349] [Impact Index Per Article: 16.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2003] [Revised: 03/16/2004] [Accepted: 03/23/2004] [Indexed: 11/21/2022]
Abstract
The structures of ribosomal proteins and their interactions with RNA have been examined in the refined crystal structure of the Haloarcula marismortui large ribosomal subunit. The protein structures fall into six groups based on their topology. The 50S subunit proteins function primarily to stabilize inter-domain interactions that are necessary to maintain the subunit's structural integrity. An extraordinary variety of protein-RNA interactions is observed. Electrostatic interactions between numerous arginine and lysine residues, particularly those in tail extensions, and the phosphate groups of the RNA backbone mediate many protein-RNA contacts. Base recognition occurs via both the minor groove and widened major groove of RNA helices, as well as through hydrophobic binding pockets that capture bulged nucleotides and through insertion of amino acid residues into hydrophobic crevices in the RNA. Primary binding sites on contiguous RNA are identified for 20 of the 50S ribosomal proteins, which along with few large protein-protein interfaces, suggest the order of assembly for some proteins and that the protein extensions fold cooperatively with RNA. The structure supports the hypothesis of co-transcriptional assembly, centered around L24 in domain I. Finally, comparing the structures and locations of the 50S ribosomal proteins from H.marismortui and D.radiodurans revealed striking examples of molecular mimicry. These comparisons illustrate that identical RNA structures can be stabilized by unrelated proteins.
Collapse
Affiliation(s)
- D J Klein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, CT 06520-8114, USA
| | | | | |
Collapse
|
38
|
Abstract
The folding degree index (Estrada, Bioinformatics 2002;18:697-704) is extended to account for the contribution of amino acids to folding. First, the mathematical formalism for extending the folding degree index is presented. Then, the amino acid contributions to folding degree of several proteins are used to analyze its relation to secondary structure. The possibilities of using these contributions in helping or checking the assignation of secondary structure to amino acids are also introduced. The influence of external factors to the amino acids contribution to folding degree is studied through the temperature effect on ribonuclease A. Finally, the analysis of 3D protein similarity through the use of amino acid contributions to folding degree is studied by selecting a series of lysozymes. These results are compared to that obtained by sequence alignment (2D similarity) and 3D superposition of the structures, showing the uniqueness of the current approach.
Collapse
Affiliation(s)
- Ernesto Estrada
- Safety and Environmental Assurance Centre, Unilever, Colworth House, Sharnbrook, Beds, and RIAIDT, Edificio CACTUS, University of Santiago de Compostela, Spain.
| |
Collapse
|
39
|
Sivaraman J, Iannuzzi P, Cygler M, Matte A. Crystal structure of the RluD pseudouridine synthase catalytic module, an enzyme that modifies 23S rRNA and is essential for normal cell growth of Escherichia coli. J Mol Biol 2004; 335:87-101. [PMID: 14659742 DOI: 10.1016/j.jmb.2003.10.003] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Pseudouridine (5-beta-D-ribofuranosyluracil, Psi) is the most commonly found modified base in RNA. Conversion of uridine to Psi is performed enzymatically in both prokaryotes and eukaryotes by pseudouridine synthases (EC 4.2.1.70). The Escherichia coli Psi-synthase RluD modifies uridine to Psi at positions 1911, 1915 and 1917 within 23S rRNA. RluD also possesses a second function related to proper assembly of the 50S ribosomal subunit that is independent of Psi-synthesis. Here, we report the crystal structure of the catalytic module of RluD (residues 68-326; DeltaRluD) refined at 1.8A to a final R-factor of 21.8% (R(free)=24.3%). DeltaRluD is a monomeric enzyme having an overall mixed alpha/beta fold. The DeltaRluD molecule consists of two subdomains, a catalytic subdomain and C-terminal subdomain with the RNA-binding cleft formed by loops extending from the catalytic sub-domain. The catalytic sub-domain of DeltaRluD has a similar fold as in TruA, TruB and RsuA, with the location of the RNA-binding cleft, active-site and conserved, catalytic Asp residue superposing in all four structures. Superposition of the crystal structure of TruB bound to a T-stem loop with RluD reveals that similar RNA-protein interactions for the flipped-out uridine base would exist in both structures, implying that base-flipping is necessary for catalysis. This observation also implies that the specificity determinants for site-specific RNA-binding and recognition likely reside in parts of RluD beyond the active site.
Collapse
Affiliation(s)
- J Sivaraman
- Department of Biochemistry, McGill University, H3G 1Y6, Montreal, Que., Canada
| | | | | | | |
Collapse
|
40
|
Michalopoulos I, Torrance GM, Gilbert DR, Westhead DR. TOPS: an enhanced database of protein structural topology. Nucleic Acids Res 2004; 32:D251-4. [PMID: 14681405 PMCID: PMC308794 DOI: 10.1093/nar/gkh060] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The TOPS database holds topological descriptions of protein structures. These compact and highly abstract descriptions reduce the protein fold to a sequence of Secondary Structure Elements (SSEs) and three sets of pairwise relationships between them, hydrogen bonds relating parallel and anti- parallel beta strands, spatial adjacencies relating neighbouring SSEs, and the chiralities of selected supersecondary structures, including connections in betaalphabeta units and between parallel alpha helices. The database is used as a resource for visualizing folding topologies, fast topological pattern searching and structure comparison. Here, significant enhancements to the TOPS database are described. The topological description has been enhanced to include packing relationships between helices, which significantly improves the description of protein folds with little beta strand content. Further, the topological description has been annotated with sequence information. The query interfaces to the database have been improved and the new version can be found at http://www.tops.leeds.ac.uk/.
Collapse
Affiliation(s)
- Ioannis Michalopoulos
- School of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK
| | | | | | | |
Collapse
|
41
|
Calderone V, Forleo C, Benvenuti M, Cristina Thaller M, Rossolini GM, Mangani S. The First Structure of a Bacterial Class B Acid Phosphatase Reveals Further Structural Heterogeneity Among Phosphatases of the Haloacid Dehalogenase Fold. J Mol Biol 2004; 335:761-73. [PMID: 14687572 DOI: 10.1016/j.jmb.2003.10.050] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
AphA is a periplasmic acid phosphatase of Escherichia coli belonging to class B bacterial phosphatases, which is part of the DDDD superfamily of phosphohydrolases. The crystal structure of AphA has been determined at 2.2A and its resolution extended to 1.7A on an AuCl(3) derivative. This represents the first crystal structure of a class B bacterial phosphatase. Despite the lack of sequence homology, the AphA structure reveals a haloacid dehalogenase-like fold. This finding suggests that this fold could be conserved among members of the DDDD superfamily of phosphohydrolases. The active enzyme is a homotetramer built by using an extended N-terminal arm intertwining the four monomers. The active site of the native enzyme, as prepared, hosts a magnesium ion, which can be replaced by other metal ions. The structure explains the non-specific behaviour of AphA towards substrates, while a structure-based alignment with other phosphatases provides clues about the catalytic mechanism.
Collapse
Affiliation(s)
- Vito Calderone
- Dipartimento di Chimica, Università di Siena, Via Aldo Moro, I-53100 Siena, Italy
| | | | | | | | | | | |
Collapse
|
42
|
Abstract
When a new protein structure has been determined, comparison with the database of known structures enables classification of its fold as new or belonging to a known class of proteins. This in turn may provide clues about the function of the protein. A large number of fold comparison programs have been developed, but they have never been subjected to a comprehensive and critical comparative analysis. Here we describe an evaluation of 11 publicly available, Web-based servers for automatic fold comparison. Both their functionality (e.g., user interface, presentation, and annotation of results) and their performance (i.e., how well established structural similarities are recognized) were assessed. The servers were subjected to a battery of performance tests covering a broad spectrum of folds as well as special cases, such as multidomain proteins, Calpha-only models, new folds, and NMR-based models. The CATH structural classification system was used as a reference. These tests revealed the strong and weak sides of each server. On the whole, CE, DALI, MATRAS, and VAST showed the best performance, but none of the servers achieved a 100% success rate. Where no structurally similar proteins are found by any individual server, it is recommended to try one or two other servers before any conclusions concerning the novelty of a fold are put on paper.
Collapse
Affiliation(s)
- Marian Novotny
- Department of Cell and Molecular Biology, Uppsala University, Biomedical Centre, Uppsala, Sweden
| | | | | |
Collapse
|
43
|
Petersen K, Taylor WR. Modelling zinc-binding proteins with GADGET: genetic algorithm and distance geometry for exploring topology. J Mol Biol 2003; 325:1039-59. [PMID: 12527307 DOI: 10.1016/s0022-2836(02)01220-2] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
A novel combination of optimization methods (Genetic Algorithm with Distance Geometry) has been developed and shown to find near-optimal solutions to a set of imposed structural constraints. With this modelling tool (GADGET), the fold-space of a variety of small zinc-binding proteins was investigated under the constraints required to form a zinc-binding site (or pair of sites). Analysis of the results concentrated on the ring-finger domain as the "classic" zinc-finger domains were too constrained to provide much topological variety, whilst the TFIIH domain (which has large unstructured loops) did not behave well. The intermediate ring-finger domain, however, was found to adopt a variety of different folds, many of which had near-optimal scores under the fitness function employed in GADGET (forming good secondary structures and zinc-coordination). Although the native fold was dominant amongst the solutions, the discovery of good alternate folds shows that even the eight residues constrained to form two zinc-binding sites was not sufficient to uniquely determine the native fold. Despite this, the fold-space of 48 theoretically possible folds was greatly reduced with just six topologies found in significant numbers.
Collapse
Affiliation(s)
- Kjell Petersen
- Department of Informatics, University of Bergen, PB7800, N-5020 Bergen, Norway
| | | |
Collapse
|
44
|
Williams A, Westhead DR. Sequence relationships in the legume lectin fold and other jelly rolls. Protein Eng Des Sel 2002; 15:771-4. [PMID: 12468709 DOI: 10.1093/protein/15.10.771] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Distant sequence relationships in proteins containing the beta jelly-roll fold were investigated using sensitive sequence comparison methods, including PSI-BLAST and Hidden Markov Models. A relationship was identified between the rmlC-like and phosphomannose isomerase SCOP (version 1.53) superfamilies, which were merged in the most recent SCOP release. No other distant sequence relationships linking jelly roll superfamilies were found.
Collapse
Affiliation(s)
- A Williams
- School of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK
| | | |
Collapse
|
45
|
Hodel AE, Hodel MR, Griffis ER, Hennig KA, Ratner GA, Xu S, Powers MA. The three-dimensional structure of the autoproteolytic, nuclear pore-targeting domain of the human nucleoporin Nup98. Mol Cell 2002; 10:347-58. [PMID: 12191480 DOI: 10.1016/s1097-2765(02)00589-0] [Citation(s) in RCA: 109] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Nup98 is a component of the nuclear pore that plays its primary role in the export of RNAs. Nup98 is expressed in two forms, derived from alternate mRNA splicing. Both forms are processed into two peptides through autoproteolysis mediated by the C-terminal domain of hNup98. The three-dimensional structure of the C-terminal domain reveals a novel protein fold, and thus a new class of autocatalytic proteases. The structure further reveals that the suggested nucleoporin RNA binding motif is unlikely to bind to RNA. The C terminus also contains sequences that target hNup98 to the nuclear pore complex. Noncovalent interactions between the C-terminal domain and the cleaved peptide tail are visible and suggest a model for cleavage-dependent targeting of hNup98 to the nuclear pore.
Collapse
Affiliation(s)
- Alec E Hodel
- Department of Biochemistry, Emory University School of Medicine, Atlanta, GA 30322, USA.
| | | | | | | | | | | | | |
Collapse
|
46
|
Anand K, Palm GJ, Mesters JR, Siddell SG, Ziebuhr J, Hilgenfeld R. Structure of coronavirus main proteinase reveals combination of a chymotrypsin fold with an extra alpha-helical domain. EMBO J 2002; 21:3213-24. [PMID: 12093723 PMCID: PMC126080 DOI: 10.1093/emboj/cdf327] [Citation(s) in RCA: 491] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
The key enzyme in coronavirus polyprotein processing is the viral main proteinase, M(pro), a protein with extremely low sequence similarity to other viral and cellular proteinases. Here, the crystal structure of the 33.1 kDa transmissible gastroenteritis (corona)virus M(pro) is reported. The structure was refined to 1.96 A resolution and revealed three dimers in the asymmetric unit. The mutual arrangement of the protomers in each of the dimers suggests that M(pro) self-processing occurs in trans. The active site, comprised of Cys144 and His41, is part of a chymotrypsin-like fold that is connected by a 16 residue loop to an extra domain featuring a novel alpha-helical fold. Molecular modelling and mutagenesis data implicate the loop in substrate binding and elucidate S1 and S2 subsites suitable to accommodate the side chains of the P1 glutamine and P2 leucine residues of M(pro) substrates. Interactions involving the N-terminus and the alpha-helical domain stabilize the loop in the orientation required for trans-cleavage activity. The study illustrates that RNA viruses have evolved unprecedented variations of the classical chymotrypsin fold.
Collapse
Affiliation(s)
| | | | | | - Stuart G. Siddell
- Department of Structural Biology and Crystallography, Institute of Molecular Biotechnology, D-07745 Jena and
Institute of Virology and Immunology, University of Würzburg, D-97078 Würzburg, Germany Corresponding authors e-mail: or
| | - John Ziebuhr
- Department of Structural Biology and Crystallography, Institute of Molecular Biotechnology, D-07745 Jena and
Institute of Virology and Immunology, University of Würzburg, D-97078 Würzburg, Germany Corresponding authors e-mail: or
| | - Rolf Hilgenfeld
- Department of Structural Biology and Crystallography, Institute of Molecular Biotechnology, D-07745 Jena and
Institute of Virology and Immunology, University of Würzburg, D-97078 Würzburg, Germany Corresponding authors e-mail: or
| |
Collapse
|
47
|
Gilbert D, Westhead D, Viksna J, Thornton J. A computer system to perform structure comparison using TOPS representations of protein structure. COMPUTERS & CHEMISTRY 2001; 26:23-30. [PMID: 11765848 DOI: 10.1016/s0097-8485(01)00096-1] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
We describe the design and implementation of a fast topology-based method for protein structure comparison. The approach uses the TOPS topological representation of protein structure, aligning two structures using a common discovered pattern and generating measure of distance derived from an insert score. Heavy use is made of a constraint-based pattern-matching algorithm for TOPS diagrams that we have designed and described elsewhere (Bioinformatics 15(4) (1999) 317). The comparison system is maintained at the European Bioinformatics Institute and is available over the Web at tops.ebi.ac.uk/tops. Users submit a structure description in Protein Data Bank (PDB) format and can compare it with structures in the entire PDB or a representative subset of protein domains, receiving the results by email.
Collapse
Affiliation(s)
- D Gilbert
- Department of Computing, City, University, Northampton Square, London EC1V 0HB, UK.
| | | | | | | |
Collapse
|
48
|
Kelly MJ, Ball LJ, Krieger C, Yu Y, Fischer M, Schiffmann S, Schmieder P, Kühne R, Bermel W, Bacher A, Richter G, Oschkinat H. The NMR structure of the 47-kDa dimeric enzyme 3,4-dihydroxy-2-butanone-4-phosphate synthase and ligand binding studies reveal the location of the active site. Proc Natl Acad Sci U S A 2001; 98:13025-30. [PMID: 11687623 PMCID: PMC60818 DOI: 10.1073/pnas.231323598] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent developments in NMR have extended the size range of proteins amenable to structural and functional characterization to include many larger proteins involved in important cellular processes. By applying a combination of residue-specific isotope labeling and protein deuteration strategies tailored to yield specific information, we were able to determine the solution structure and study structure-activity relationships of 3,4-dihydroxy-2-butanone-4-phosphate synthase, a 47-kDa enzyme from the Escherichia coli riboflavin biosynthesis pathway and an attractive target for novel antibiotics. Our investigations of the enzyme's ligand binding by NMR and site-directed mutagenesis yields a conclusive picture of the location and identity of residues directly involved in substrate binding and catalysis. Our studies illustrate the power of state-of-the-art NMR techniques for the structural characterization and investigation of ligand binding in protein complexes approaching the 50-kDa range in solution.
Collapse
Affiliation(s)
- M J Kelly
- Research Institute for Molecular Pharmacology, Robert-Rössle-Strasse 10, D-13125 Berlin, Germany.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Schroeder M, Gilbert D, van Helden J, Noy P. Approaches to visualisation in bioinformatics: from dendrograms to Space Explorer. Inf Sci (N Y) 2001. [DOI: 10.1016/s0020-0255(01)00156-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
|
50
|
Lee JE, Cornell KA, Riscoe MK, Howell PL. Structure of E. coli 5'-methylthioadenosine/S-adenosylhomocysteine nucleosidase reveals similarity to the purine nucleoside phosphorylases. Structure 2001; 9:941-53. [PMID: 11591349 DOI: 10.1016/s0969-2126(01)00656-6] [Citation(s) in RCA: 70] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
BACKGROUND 5'-methylthioadenosine/S-adenosyl-homocysteine (MTA/AdoHcy) nucleosidase catalyzes the irreversible cleavage of 5'-methylthioadenosine and S-adenosylhomocysteine to adenine and the corresponding thioribose, 5'-methylthioribose and S-ribosylhomocysteine, respectively. While this enzyme is crucial for the metabolism of AdoHcy and MTA nucleosides in many prokaryotic and lower eukaryotic organisms, it is absent in mammalian cells. This metabolic difference represents an exploitable target for rational drug design. RESULTS The crystal structure of E. coli MTA/AdoHcy nucleosidase was determined at 1.90 A resolution with the multiwavelength anomalous diffraction (MAD) technique. Each monomer of the MTA/AdoHcy nucleosidase dimer consists of a mixed alpha/beta domain with a nine-stranded mixed beta sheet, flanked by six alpha helices and a small 3(10) helix. Intersubunit contacts between the two monomers present in the asymmetric unit are mediated primarily by helix-helix and helix-loop hydrophobic interactions. The unexpected presence of an adenine molecule in the active site of the enzyme has allowed the identification of both substrate binding and potential catalytic amino acid residues. CONCLUSIONS Although the sequence of E. coli MTA/AdoHcy nucleosidase has almost no identity with any known enzyme, its tertiary structure is similar to both the mammalian (trimeric) and prokaryotic (hexameric) purine nucleoside phosphorylases. The structure provides evidence that this protein is functional as a dimer and that the dual specificity for MTA and AdoHcy results from the truncation of a helix. The structure of MTA/AdoHcy nucleosidase is the first structure of a prokaryotic nucleoside N-ribohydrolase specific for 6-aminopurines.
Collapse
Affiliation(s)
- J E Lee
- Structural Biology and Biochemistry, Research Institute, Hospital for Sick Children, 555 University Avenue, Toronto, Ontario M5G 1X8, Canada
| | | | | | | |
Collapse
|