1
|
Zhao T, Gussak A, van der Hee B, Brugman S, van Baarlen P, Wells JM. Identification of plasminogen-binding sites in Streptococcus suis enolase that contribute to bacterial translocation across the blood-brain barrier. Front Cell Infect Microbiol 2024; 14:1356628. [PMID: 38456079 PMCID: PMC10919400 DOI: 10.3389/fcimb.2024.1356628] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2023] [Accepted: 02/06/2024] [Indexed: 03/09/2024] Open
Abstract
Streptococcus suis is an emerging zoonotic pathogen that can cause invasive disease commonly associated with meningitis in pigs and humans. To cause meningitis, S. suis must cross the blood-brain barrier (BBB) comprising blood vessels that vascularize the central nervous system (CNS). The BBB is highly selective due to interactions with other cell types in the brain and the composition of the extracellular matrix (ECM). Purified streptococcal surface enolase, an essential enzyme participating in glycolysis, can bind human plasminogen (Plg) and plasmin (Pln). Plg has been proposed to increase bacterial traversal across the BBB via conversion to Pln, a protease which cleaves host proteins in the ECM and monocyte chemoattractant protein 1 (MCP1) to disrupt tight junctions. The essentiality of enolase has made it challenging to unequivocally demonstrate its role in binding Plg/Pln on the bacterial surface and confirm its predicted role in facilitating translocation of the BBB. Here, we report on the CRISPR/Cas9 engineering of S. suis enolase mutants eno261, eno252/253/255, eno252/261, and eno434/435 possessing amino acid substitutions at in silico predicted binding sites for Plg. As expected, amino acid substitutions in the predicted Plg binding sites reduced Plg and Pln binding to S. suis but did not affect bacterial growth in vitro compared to the wild-type strain. The binding of Plg to wild-type S. suis enhanced translocation across the human cerebral microvascular endothelial cell line hCMEC/D3 but not for the eno mutant strains tested. To our knowledge, this is the first study where predicted Plg-binding sites of enolase have been mutated to show altered Plg and Pln binding to the surface of S. suis and attenuation of translocation across an endothelial cell monolayer in vitro.
Collapse
Affiliation(s)
| | | | | | | | | | - Jerry M. Wells
- Host-Microbe Interactomics, Wageningen University & Research, Wageningen, Netherlands
| |
Collapse
|
2
|
Carpentier M, Chomilier J. Protein multiple alignments: sequence-based versus structure-based programs. Bioinformatics 2020; 35:3970-3980. [PMID: 30942864 DOI: 10.1093/bioinformatics/btz236] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 03/05/2019] [Accepted: 04/02/2019] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Multiple sequence alignment programs have proved to be very useful and have already been evaluated in the literature yet not alignment programs based on structure or both sequence and structure. In the present article we wish to evaluate the added value provided through considering structures. RESULTS We compared the multiple alignments resulting from 25 programs either based on sequence, structure or both, to reference alignments deposited in five databases (BALIBASE 2 and 3, HOMSTRAD, OXBENCH and SISYPHUS). On the whole, the structure-based methods compute more reliable alignments than the sequence-based ones, and even than the sequence+structure-based programs whatever the databases. Two programs lead, MAMMOTH and MATRAS, nevertheless the performances of MUSTANG, MATT, 3DCOMB, TCOFFEE+TM_ALIGN and TCOFFEE+SAP are better for some alignments. The advantage of structure-based methods increases at low levels of sequence identity, or for residues in regular secondary structures or buried ones. Concerning gap management, sequence-based programs set less gaps than structure-based programs. Concerning the databases, the alignments of the manually built databases are more challenging for the programs. AVAILABILITY AND IMPLEMENTATION All data and results presented in this study are available at: http://wwwabi.snv.jussieu.fr/people/mathilde/download/AliMulComp/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mathilde Carpentier
- Institut Systématique Evolution Biodiversité (ISYEB), Sorbonne Université, MNHN, CNRS, EPHE, Paris, France
| | - Jacques Chomilier
- Sorbonne Université, MNHN, CNRS, IRD, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie (IMPMC), BiBiP, Paris, France
| |
Collapse
|
3
|
Collier JH, Allison L, Lesk AM, Garcia de la Banda M, Konagurthu AS. A new statistical framework to assess structural alignment quality using information compression. Bioinformatics 2015; 30:i512-8. [PMID: 25161241 PMCID: PMC4147913 DOI: 10.1093/bioinformatics/btu460] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: Progress in protein biology depends on the reliability of results from a handful of computational techniques, structural alignments being one. Recent reviews have highlighted substantial inconsistencies and differences between alignment results generated by the ever-growing stock of structural alignment programs. The lack of consensus on how the quality of structural alignments must be assessed has been identified as the main cause for the observed differences. Current methods assess structural alignment quality by constructing a scoring function that attempts to balance conflicting criteria, mainly alignment coverage and fidelity of structures under superposition. This traditional approach to measuring alignment quality, the subject of considerable literature, has failed to solve the problem. Further development along the same lines is unlikely to rectify the current deficiencies in the field. Results: This paper proposes a new statistical framework to assess structural alignment quality and significance based on lossless information compression. This is a radical departure from the traditional approach of formulating scoring functions. It links the structural alignment problem to the general class of statistical inductive inference problems, solved using the information-theoretic criterion of minimum message length. Based on this, we developed an efficient and reliable measure of structural alignment quality, I-value. The performance of I-value is demonstrated in comparison with a number of popular scoring functions, on a large collection of competing alignments. Our analysis shows that I-value provides a rigorous and reliable quantification of structural alignment quality, addressing a major gap in the field. Availability: http://lcb.infotech.monash.edu.au/I-value Contact: arun.konagurthu@monash.edu Supplementary information:Online supplementary data are available at http://lcb.infotech.monash.edu.au/I-value/suppl.html
Collapse
Affiliation(s)
- James H Collier
- Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia and Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Lloyd Allison
- Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia and Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Arthur M Lesk
- Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia and Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Maria Garcia de la Banda
- Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia and Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| | - Arun S Konagurthu
- Clayton School of Information Technology, Monash University, Clayton, VIC 3800, Australia and Department of Biochemistry and Molecular Biology, Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
4
|
Abstract
The analysis of the three-dimensional structure of proteins is an important topic in molecular biochemistry. Structure plays a critical role in defining the function of proteins and is more strongly conserved than amino acid sequence over evolutionary timescales. A key challenge is the identification and evaluation of structural similarity between proteins; such analysis can aid in understanding the role of newly discovered proteins and help elucidate evolutionary relationships between organisms. Computational biologists have developed many clever algorithmic techniques for comparing protein structures, however, all are based on heuristic optimization criteria, making statistical interpretation somewhat difficult. Here we present a fully probabilistic framework for pairwise structural alignment of proteins. Our approach has several advantages, including the ability to capture alignment uncertainty and to estimate key "gap" parameters which critically affect the quality of the alignment. We show that several existing alignment methods arise as maximum a posteriori estimates under specific choices of prior distributions and error models. Our probabilistic framework is also easily extended to incorporate additional information, which we demonstrate by including primary sequence information to generate simultaneous sequence-structure alignments that can resolve ambiguities obtained using structure alone. This combined model also provides a natural approach for the difficult task of estimating evolutionary distance based on structural alignments. The model is illustrated by comparison with well-established methods on several challenging protein alignment examples.
Collapse
Affiliation(s)
- Abel Rodriguez
- University of California, Santa Cruz and Duke University
| | | |
Collapse
|
5
|
Slater AW, Castellanos JI, Sippl MJ, Melo F. Towards the development of standardized methods for comparison, ranking and evaluation of structure alignments. Bioinformatics 2012; 29:47-53. [PMID: 23060612 DOI: 10.1093/bioinformatics/bts600] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Pairwise alignment of protein structures is a fundamental task in structural bioinformatics. There are numerous computer programs in the public domain that produce alignments for a given pair of protein structures, but the results obtained by the various programs generally differ substantially. Hence, in the application of such programs the question arises which of the alignment programs are the most trustworthy in the sense of overall performance, and which programs provide the best result for a given pair of proteins. The major problem in comparing, evaluating and judging alignment results is that there is no clear notion of the optimality of an alignment. As a consequence, the numeric criteria and scores reported by the individual structure alignment programs are largely incomparable. RESULTS Here we report on the development and application of a new approach for the evaluation of structure alignment results. The method uses the translation vector and rotation matrix to generate the superposition of two structures but discards the alignment reported by the individual programs. The optimal alignment is then generated in standardized form based on a suitably implemented dynamic programming algorithm where the length of the alignment is the single most informative parameter. We demonstrate that some of the most popular programs in protein structure research differ considerably in their overall performance. In particular, each of the programs investigated here produced in at least in one case the best and the worst alignment compared with all others. Hence, at the current state of development of structure comparison techniques, it is advisable to use several programs in parallel and to choose the optimal alignment in the way reported here. AVAILABILITY AND IMPLEMENTATION The computer software that implement the method described here is freely available at http://melolab.org/stovca.
Collapse
Affiliation(s)
- Alex W Slater
- Molecular Bioinformatics Laboratory, Millennium Institute on Immunology and Immunotherapy, Portugal 49, Santiago, CP 8330025, Chile
| | | | | | | |
Collapse
|
6
|
Sippl MJ, Wiederstein M. Detection of spatial correlations in protein structures and molecular complexes. Structure 2012; 20:718-28. [PMID: 22483118 PMCID: PMC3320710 DOI: 10.1016/j.str.2012.01.024] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2011] [Revised: 01/09/2012] [Accepted: 01/31/2012] [Indexed: 10/28/2022]
Abstract
Protein structures are frequently related by spectacular and often surprising similarities. Structural correlations among protein chains are routinely detected by various structure-matching techniques, but the comparison of oligomers and molecular complexes is largely uncharted territory. Here we solve the structure-matching problem for oligomers and large molecular aggregates, including the largest molecular complexes known today. We provide several challenging examples that cannot be handled by conventional structure-matching techniques and we report on a number of remarkable correlations. The examples cover the cell-puncturing device of bacteriophage T4, the secretion system of P. aeruginosa, members of the dehydrogenase family, DNA clamps, ferredoxin iron-storage cages, and virus capsids.
Collapse
Affiliation(s)
- Manfred J Sippl
- Division of Bioinformatics, Department of Molecular Biology, University of Salzburg, Hellbrunnerstraße 34, 5020 Salzburg, Austria.
| | | |
Collapse
|
7
|
Sehnal D, Vařeková RS, Huber HJ, Geidl S, Ionescu CM, Wimmerová M, Koča J. SiteBinder: an improved approach for comparing multiple protein structural motifs. J Chem Inf Model 2012; 52:343-59. [PMID: 22296449 DOI: 10.1021/ci200444d] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
There is a paramount need to develop new techniques and tools that will extract as much information as possible from the ever growing repository of protein 3D structures. We report here on the development of a software tool for the multiple superimposition of large sets of protein structural motifs. Our superimposition methodology performs a systematic search for the atom pairing that provides the best fit. During this search, the RMSD values for all chemically relevant pairings are calculated by quaternion algebra. The number of evaluated pairings is markedly decreased by using PDB annotations for atoms. This approach guarantees that the best fit will be found and can be applied even when sequence similarity is low or does not exist at all. We have implemented this methodology in the Web application SiteBinder, which is able to process up to thousands of protein structural motifs in a very short time, and which provides an intuitive and user-friendly interface. Our benchmarking analysis has shown the robustness, efficiency, and versatility of our methodology and its implementation by the successful superimposition of 1000 experimentally determined structures for each of 32 eukaryotic linear motifs. We also demonstrate the applicability of SiteBinder using three case studies. We first compared the structures of 61 PA-IIL sugar binding sites containing nine different sugars, and we found that the sugar binding sites of PA-IIL and its mutants have a conserved structure despite their binding different sugars. We then superimposed over 300 zinc finger central motifs and revealed that the molecular structure in the vicinity of the Zn atom is highly conserved. Finally, we superimposed 12 BH3 domains from pro-apoptotic proteins. Our findings come to support the hypothesis that there is a structural basis for the functional segregation of BH3-only proteins into activators and enablers.
Collapse
Affiliation(s)
- David Sehnal
- National Centre for Biomolecular Research, Faculty of Science and CEITEC-Central European Institute of Technology, Masaryk University Brno, Kamenice 5, 62500 Brno-Bohunice, Czech Republic
| | | | | | | | | | | | | |
Collapse
|
8
|
Poleksic A. Optimizing a widely used protein structure alignment measure in expected polynomial time. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:1716-1720. [PMID: 21904019 DOI: 10.1109/tcbb.2011.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Protein structure alignment is an important tool in many biological applications, such as protein evolution studies, protein structure modeling, and structure-based, computer-aided drug design. Protein structure alignment is also one of the most challenging problems in computational molecular biology, due to an infinite number of possible spatial orientations of any two protein structures. We study one of the most commonly used measures of pairwise protein structure similarity, defined as the number of pairs of atoms in two proteins that can be superimposed under a predefined distance cutoff. We prove that the expected running time of a recently published algorithm for optimizing this (and some other, derived measures of protein structure similarity) is polynomial.
Collapse
Affiliation(s)
- Aleksandar Poleksic
- Department of Computer Science, University of Northern Iowa, 305 ITTC, Cedar Falls, IA 50614-0507, USA.
| |
Collapse
|
9
|
Poleksic A. On complexity of protein structure alignment problem under distance constraint. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 9:511-516. [PMID: 22025757 DOI: 10.1109/tcbb.2011.133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
We study the well known LCP (Largest Common Point-Set) under Bottleneck Distance Problem. Given two proteins a and b (as sequences of points in 3D space) and a distance cutoff σ, the goal is to find a spatial superposition and an alignment that maximizes the number of pairs of points from a and b that can be fit under the distance σ from each other. The best to date algorithms for approximate and exact solution to this problem run in time O(n^8) and O(n^32), respectively, where n represents the protein length. This work improves the runtime of the approximation algorithm and the algorithm for absolute optimum for both order-dependent and order-independent alignments. More specifically, our algorithms for near-optimal and optimal sequential alignments run in time O(^7 log n) and O(n^14 log n), respectively. For non-sequential alignments, corresponding running times are O(n^7.5) and O(n^14.5).
Collapse
|
10
|
Joseph AP, Srinivasan N, de Brevern AG. Improvement of protein structure comparison using a structural alphabet. Biochimie 2011; 93:1434-45. [PMID: 21569819 DOI: 10.1016/j.biochi.2011.04.010] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 04/12/2011] [Indexed: 12/29/2022]
Abstract
The three dimensional structure of a protein provides major insights into its function. Protein structure comparison has implications in functional and evolutionary studies. A structural alphabet (SA) is a library of local protein structure prototypes that can abstract every part of protein main chain conformation. Protein Blocks (PBs) is a widely used SA, composed of 16 prototypes, each representing a pentapeptide backbone conformation defined in terms of dihedral angles. Through this description, the 3D structural information can be translated into a 1D sequence of PBs. In a previous study, we have used this approach to compare protein structures encoded in terms of PBs. A classical sequence alignment procedure based on dynamic programming was used, with a dedicated PB Substitution Matrix (SM). PB-based pairwise structural alignment method gave an excellent performance, when compared to other established methods for mining. In this study, we have (i) refined the SMs and (ii) improved the Protein Block Alignment methodology (named as iPBA). The SM was normalized in regards to sequence and structural similarity. Alignment of protein structures often involves similar structural regions separated by dissimilar stretches. A dynamic programming algorithm that weighs these local similar stretches has been designed. Amino acid substitutions scores were also coupled linearly with the PB substitutions. iPBA improves (i) the mining efficiency rate by 6.8% and (ii) more than 82% of the alignments have a better quality. A higher efficiency in aligning multi-domain proteins could be also demonstrated. The quality of alignment is better than DALI and MUSTANG in 81.3% of the cases. Thus our study has resulted in an impressive improvement in the quality of protein structural alignment.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, 6, rue Alexandre Cabanel, 75739 Paris Cedex 15, France.
| | | | | |
Collapse
|
11
|
Shen YF, Li B, Liu ZP. Protein structure alignment based on internal coordinates. Interdiscip Sci 2010; 2:308-19. [DOI: 10.1007/s12539-010-0019-8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2008] [Revised: 01/05/2010] [Accepted: 01/06/2010] [Indexed: 10/18/2022]
|
12
|
Shibuya T, Jansson J, Sadakane K. Linear-time protein 3-D structure searching with insertions and deletions. Algorithms Mol Biol 2010; 5:7. [PMID: 20047663 PMCID: PMC2830924 DOI: 10.1186/1748-7188-5-7] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2009] [Accepted: 01/04/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Two biomolecular 3-D structures are said to be similar if the RMSD (root mean square deviation) between the two molecules' sequences of 3-D coordinates is less than or equal to some given constant bound. Tools for searching for similar structures in biomolecular 3-D structure databases are becoming increasingly important in the structural biology of the post-genomic era. RESULTS We consider an important, fundamental problem of reporting all substructures in a 3-D structure database of chain molecules (such as proteins) which are similar to a given query 3-D structure, with consideration of indels (i.e., insertions and deletions). This problem has been believed to be very difficult but its exact computational complexity has not been known. In this paper, we first prove that the problem in unbounded dimensions is NP-hard. We then propose a new algorithm that dramatically improves the average-case time complexity of the problem in 3-D in case the number of indels k is bounded by a constant. Our algorithm solves the above problem for a query of size m and a database of size N in average-case O(N) time, whereas the time complexity of the previously best algorithm was O(Nm(k+1)). CONCLUSIONS Our results show that although the problem of searching for similar structures in a database based on the RMSD measure with indels is NP-hard in the case of unbounded dimensions, it can be solved in 3-D by a simple average-case linear time algorithm when the number of indels is bounded by a constant.
Collapse
Affiliation(s)
- Tetsuo Shibuya
- Human Genome Center, Institute of Medical Science, University of Tokyo 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Jesper Jansson
- Ochanomizu University, 2-1-1 Ohtsuka, Bunkyo-ku, Tokyo 112-8610, Japan
| | - Kunihiko Sadakane
- National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan
| |
Collapse
|
13
|
Abstract
Protein structures often show similarities to another which would not be seen at the sequence level. Given the coordinates of a protein chain, the SALAMI server atwww.zbh.uni-hamburg.de/salami will search the protein data bank and return a set of similar structures without using sequence information. The results page lists the related proteins, details of the sequence and structure similarity and implied sequence alignments. Via a simple structure viewer, one can view superpositions of query and library structures and finally download superimposed coordinates. The alignment method is very tolerant of large gaps and insertions, and tends to produce slightly longer alignments than other similar programs.
Collapse
Affiliation(s)
- Thomas Margraf
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.
| | | | | |
Collapse
|
14
|
Tai CH, Vincent JJ, Kim C, Lee B. SE: an algorithm for deriving sequence alignment from a pair of superimposed structures. BMC Bioinformatics 2009; 10 Suppl 1:S4. [PMID: 19208141 PMCID: PMC2648757 DOI: 10.1186/1471-2105-10-s1-s4] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Background Generating sequence alignments from superimposed structures is an important part of many structure comparison programs. The accuracy of the alignment affects structure recognition, classification and possibly function prediction. Many programs use a dynamic programming algorithm to generate the sequence alignment from superimposed structures. However, this procedure requires using a gap penalty and, depending on the value of the penalty used, can introduce spurious gaps and misalignments. Here we present a new algorithm, Seed Extension (SE), for generating the sequence alignment from a pair of superimposed structures. The SE algorithm first finds "seeds", which are the pairs of residues, one from each structure, that meet certain stringent criteria for being structurally equivalent. Three consecutive seeds form a seed segment, which is extended along the diagonal of the alignment matrix in both directions. Distance and the amino acid type similarity between the residues are used to resolve conflicts that arise during extension of more than one diagonal. The manually curated alignments in the Conserved Domain Database were used as the standard to assess the quality of the sequence alignments. Results SE gave an average accuracy of 95.9% over 582 pairs of superimposed proteins tested, while CHIMERA, LSQMAN, and DP extracted from SHEBA, which all use a dynamic programming algorithm, yielded 89.9%, 90.2% and 91.0%, respectively. For pairs of proteins with low sequence or structural similarity, SE produced alignments up to 18% more accurate on average than the next best scoring program. Improvement was most pronounced when the two superimposed structures contained equivalent helices or beta-strands that crossed at an angle. When the SE algorithm was implemented in SHEBA to replace the dynamic programming routine, the alignment accuracy improved by 10% on average for structure pairs with RMSD between 2 and 4 Å. SE also used considerably less CPU time than DP. Conclusion The Seed Extension algorithm is fast and, without using a gap penalty, produces more accurate sequence alignments from superimposed structures than three other programs tested that use dynamic programming algorithm.
Collapse
Affiliation(s)
- Chin-Hsien Tai
- Molecular Modeling and Bioinformatics Section, Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA.
| | | | | | | |
Collapse
|
15
|
|
16
|
Abstract
UNLABELLED The database of known protein structures contains an overwhelming number of structural similarities that frequently point to intriguing biological relationships. The similarities are often difficult to spot, and once detected their comprehension needs proper visualization. Here we introduce the new concept of a Fold Space Navigator, a user interface enabling the efficient navigation through fold space and the instantaneous visualization of pairwise structure similarities. AVAILABILITY The Fold Space Navigator is accessible as a public web service at http://services.came.sbg.ac.at
Collapse
Affiliation(s)
- Manfred J Sippl
- Center of Applied Molecular Engineering, Division of Bioinformatics, Department of Molecular Biology, University of Salzburg, Hellbrunnerstr. 34, 5020 Salzburg, Austria.
| | | | | | | |
Collapse
|
17
|
Abstract
UNLABELLED Progress in structural biology depends on several key technologies. In particular tools for alignment and superposition of protein structures are indispensable. Here we describe the use of the TopMatch web service, an effective computational tool for protein structure alignment, for the visualization of structural similarities, and for highlighting relationships found in protein classifications. We provide several instructive examples. AVAILABILITY TopMatch is available as a public web service at http://services.came.sbg.ac.at.
Collapse
Affiliation(s)
- Manfred J Sippl
- Center of Applied Molecular Engineering, Division of Bioinformatics, Department of Molecular Biology, University of Salzburg, Hellbrunnerstr. 34, 5020 Salzburg, Austria.
| | | |
Collapse
|
18
|
Zemla AT, Zhou CLE. Structural Re-Alignment in an Immunogenic Surface Region of Ricin a Chain. Bioinform Biol Insights 2008; 2:5-13. [PMID: 19812763 PMCID: PMC2735970 DOI: 10.4137/bbi.s437] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
We compared structure alignments generated by several protein structure comparison programs to determine whether existing methods would satisfactorily align residues at a highly conserved position within an immunogenic loop in ribosome inactivating proteins (RIPs). Using default settings, structure alignments generated by several programs (CE, DaliLite, FATCAT, LGA, MAMMOTH, MATRAS, SHEBA, SSM) failed to align the respective conserved residues, although LGA reported correct residue-residue (R-R) correspondences when the beta-carbon (Cb) position was used as the point of reference in the alignment calculations. Further tests using variable points of reference indicated that points distal from the beta carbon along a vector connecting the alpha and beta carbons yielded rigid structural alignments in which residues known to be highly conserved in RIPs were reported as corresponding residues in structural comparisons between ricin A chain, abrin-A, and other RIPs. Results suggest that approaches to structure alignment employing alternate point representations corresponding to side chain position may yield structure alignments that are more consistent with observed conservation of functional surface residues than do standard alignment programs, which apply uniform criteria for alignment (i.e. alpha carbon (Ca) as point of reference) along the entirety of the peptide chain. We present the results of tests that suggest the utility of allowing user-specified points of reference in generating alternate structural alignments, and we present a web server for automatically generating such alignments: http://as2ts.llnl.gov/AS2TS/LGA/lga_pdblist_plots.html.
Collapse
Affiliation(s)
- Adam T. Zemla
- Computational Biology for Countermeasures Group, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 94550
| | - Carol L. Ecale Zhou
- Computational Biology for Countermeasures Group, Lawrence Livermore National Laboratory, Livermore, CA, U.S.A. 94550
| |
Collapse
|
19
|
Kim C, Lee B. Accuracy of structure-based sequence alignment of automatic methods. BMC Bioinformatics 2007; 8:355. [PMID: 17883866 PMCID: PMC2039753 DOI: 10.1186/1471-2105-8-355] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2007] [Accepted: 09/20/2007] [Indexed: 11/10/2022] Open
Abstract
Background Accurate sequence alignments are essential for homology searches and for building three-dimensional structural models of proteins. Since structure is better conserved than sequence, structure alignments have been used to guide sequence alignments and are commonly used as the gold standard for sequence alignment evaluation. Nonetheless, as far as we know, there is no report of a systematic evaluation of pairwise structure alignment programs in terms of the sequence alignment accuracy. Results In this study, we evaluate CE, DaliLite, FAST, LOCK2, MATRAS, SHEBA and VAST in terms of the accuracy of the sequence alignments they produce, using sequence alignments from NCBI's human-curated Conserved Domain Database (CDD) as the standard of truth. We find that 4 to 9% of the residues on average are either not aligned or aligned with more than 8 residues of shift error and that an additional 6 to 14% of residues on average are misaligned by 1–8 residues, depending on the program and the data set used. The fraction of correctly aligned residues generally decreases as the sequence similarity decreases or as the RMSD between the Cα positions of the two structures increases. It varies significantly across CDD superfamilies whether shift error is allowed or not. Also, alignments with different shift errors occur between proteins within the same CDD superfamily, leading to inconsistent alignments between superfamily members. In general, residue pairs that are more than 3.0 Å apart in the reference alignment are heavily (>= 25% on average) misaligned in the test alignments. In addition, each method shows a different pattern of relative weaknesses for different SCOP classes. CE gives relatively poor results for β-sheet-containing structures (all-β, α/β, and α+β classes), DaliLite for "others" class where all but the major four classes are combined, and LOCK2 and VAST for all-β and "others" classes. Conclusion When the sequence similarity is low, structure-based methods produce better sequence alignments than by using sequence similarities alone. However, current structure-based methods still mis-align 11–19% of the conserved core residues when compared to the human-curated CDD alignments. The alignment quality of each program depends on the protein structural type and similarity, with DaliLite showing the most agreement with CDD on average.
Collapse
Affiliation(s)
- Changhoon Kim
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute National Institutes of Health, Bethesda, Maryland, USA
| | - Byungkook Lee
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute National Institutes of Health, Bethesda, Maryland, USA
| |
Collapse
|
20
|
Tidow H, Andreeva A, Rutherford TJ, Fersht AR. Solution structure of ASPP2 N-terminal domain (N-ASPP2) reveals a ubiquitin-like fold. J Mol Biol 2007; 371:948-58. [PMID: 17594908 DOI: 10.1016/j.jmb.2007.05.024] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2007] [Revised: 05/07/2007] [Accepted: 05/07/2007] [Indexed: 11/30/2022]
Abstract
Proteins of the ASPP family bind to p53 and regulate p53-mediated apoptosis. Two family members, ASPP1 and ASPP2, have pro-apoptotic functions while iASPP shows anti-apoptotic responses. However, both the mechanism of enhancement/repression of apoptosis and the molecular basis for their different responses remain unknown. To address the role of the N-termini of pro-apoptotic ASPP proteins, we solved the solution structure of N-ASPP2 (1-83) by NMR spectroscopy. The structure of this domain reveals a beta-Grasp ubiquitin-like fold. Our findings suggest a possible role for the N-termini of ASPP proteins in binding to other proteins in the apoptotic response network and thus mediating their selective pro-apoptotic function.
Collapse
Affiliation(s)
- Henning Tidow
- MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 0QH, UK
| | | | | | | |
Collapse
|
21
|
Suhrer SJ, Gruber M, Sippl MJ. QSCOP-BLAST--fast retrieval of quantified structural information for protein sequences of unknown structure. Nucleic Acids Res 2007; 35:W411-5. [PMID: 17478501 PMCID: PMC1933160 DOI: 10.1093/nar/gkm264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
QSCOP is a quantitative structural classification of proteins which distinguishes itself from other classifications by two essential properties: (i) QSCOP is concurrent with the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank and (ii) QSCOP covers the widely used SCOP classification with layers of quantitative structural information. The QSCOP-BLAST web server presented here combines the BLAST sequence search engine with QSCOP to retrieve, for a given query sequence, all structural information currently available. The resulting search engine is reliable in terms of the quality of results obtained, and it is efficient in that results are displayed instantaneously. The hierarchical organization of QSCOP is used to control the redundancy and diversity of the retrieved hits with the benefit that the often cumbersome and difficult interpretation of search results is an intuitive and straightforward exercise. We demonstrate the use of QSCOP-BLAST by example. The server is accessible at http://qscop-blast.services.came.sbg.ac.at/
Collapse
Affiliation(s)
| | | | - Manfred J. Sippl
- *To whom correspondence should be addressed. 0043-662-8044-57960043-662-8044-176
| |
Collapse
|
22
|
Wang Y, Makedon F, Ford J, Huang H. A bipartite graph matching framework for finding correspondences between structural elements in two proteins. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:2972-5. [PMID: 17270902 DOI: 10.1109/iembs.2004.1403843] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/13/2023]
Abstract
A protein molecule consists one or more chains of amino acid sequences that fold into a complex three-dimensional structure. A protein's functions are often determined by its 3D structure, and so comparing the similarity of 3D structures between proteins is an important problem. To accomplish such comparison, one must align two proteins properly with rotation and translation in 3D space. Finding the correspondences between structural elements in the two proteins is the key step in many protein structure alignment algorithms. We introduce a new graph theoretic framework based on bipartite graph matching for finding sufficiently good correspondences. It is capable of providing both sequence-dependent and sequence-independent correspondences. It is a general framework for pair-wise matching of atoms, amino acids residues or secondary structure elements.
Collapse
Affiliation(s)
- Yuhang Wang
- Dept. of Comput. Sci., Dartmouth Coll., Hanover, NH, USA
| | | | | | | |
Collapse
|
23
|
Abstract
UNLABELLED The database SCOP (Structural Classification Of Proteins) has become a major resource in bioinformatics and protein science. A particular strength of SCOP is the flexibility of its rules enabling the preservation of the many details spotted by experts in the classification process. Here we endow classic SCOP Families with quantified structural information and comment on the structural diversity found in the SCOP hierarchy. AVAILABILITY Quantified SCOP (QSCOP) is available as a public WEB service. http://services.came.sbg.ac.at.
Collapse
Affiliation(s)
- Stefan J Suhrer
- Center of Applied Molecular Engineering, Department of Bioinformatics, Division of Molecular Biology, University of Salzburg, 5020 Salzburg, Austria
| | | | | |
Collapse
|
24
|
Andreeva A, Prlić A, Hubbard TJP, Murzin AG. SISYPHUS--structural alignments for proteins with non-trivial relationships. Nucleic Acids Res 2006; 35:D253-9. [PMID: 17068077 PMCID: PMC1635320 DOI: 10.1093/nar/gkl746] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
With the increasing amount of structural data, the number of homologous protein structures bearing topological irregularities is steadily growing. These include proteins with circular permutations, segment-swapping, context-dependent folding or chameleon sequences that can adopt alternative secondary structures. Their non-trivial structural relationships are readily identified during expert analysis but their automatic identification using the existing computational tools still remains difficult or impossible. Such non-trivial cases of protein relationships are known to pose a problem to multiple alignment algorithms and to impede comparative modeling studies. They support a new emerging concept of evolutionary changeable protein fold, which creates practical difficulties for the hierarchical classifications of protein structures.To facilitate the understanding of, and to provide a comprehensive annotation of proteins with such non-trivial structural relationships we have created SISYPHUS ([Σισυϕος]—in Greek crafty), a compendium to the SCOP database. The SISYPHUS database contains a collection of manually curated structural alignments and their inter-relationships. The multiple alignments are constructed for protein structural regions that range from oligomeric biological units, or individual domains to fragments of different size. The SISYPHUS multiple alignments are displayed with SPICE, a browser that provides an integrated view of protein sequences, structures and their annotations. The database is available from .
Collapse
Affiliation(s)
- Antonina Andreeva
- MRC Centre for Protein Engineering, Hills Road, Cambridge CB2 2QH, UK.
| | | | | | | |
Collapse
|
25
|
Shih ESC, Gan RCR, Hwang MJ. OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006; 34:W95-8. [PMID: 16845117 PMCID: PMC1538888 DOI: 10.1093/nar/gkl264] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The large number of experimentally determined protein 3D structures is a rich resource for studying protein function and evolution, and protein structure comparison (PSC) is a key method for such studies. When comparing two protein structures, almost all currently available PSC servers report a single and sequential (i.e. topological) alignment, whereas the existence of good alternative alignments, including those involving permutations (i.e. non-sequential or non-topological alignments), is well known. We have recently developed a novel PSC method that can detect alternative alignments of statistical significance (alignment similarity P-value <10−5), including structural permutations at all levels of complexity. OPAAS, the server of this PSC method freely accessible at our website (), provides an easy-to-read hierarchical layout of output to display detailed information on all of the significant alternative alignments detected. Because these alternative alignments can offer a more complete picture on the structural, evolutionary and functional relationship between two proteins, OPAAS can be used in structural bioinformatics research to gain additional insight that is not readily provided by existing PSC servers.
Collapse
Affiliation(s)
| | | | - Ming-Jing Hwang
- To whom correspondence should be addressed. Tel: +886 2 2789 9033; Fax: +886 2 2788 7641;
| |
Collapse
|
26
|
Abstract
The Sixth Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP6) held in December 2004 focused on the prediction of the structures of 90 protein domains from 64 targets. Thirty-eight of these were classified as "fold recognition," defined as being similar in fold to proteins of known structure at the time of submission of the predictions. Only the "first" predictions and those longer than 20 amino acids for each domain were assessed, resulting in 4527 predictions from 165 groups. The assessment was accomplished by the use of six structure alignment programs and three scoring measures based on these alignments. The use of a variety of measures resulted in scoring insensitive to the peculiarities of any one alignment method. The top-ranked methods in the prediction of structures that were clearly homologous to proteins in the Protein Data Bank primarily used servers and other programs based on achieving a consensus of many remote homology detection and fold recognition methods. The top-ranked methods in prediction of structures less clearly related or unrelated to proteins of known structures used fragment building methods in addition to the fold recognition meta methods.
Collapse
Affiliation(s)
- Guoli Wang
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA
| | | | | |
Collapse
|
27
|
Abstract
A novel protein structure alignment technique has been developed reducing much of the secondary and tertiary structure to a sequential representation greatly accelerating many structural computations, including alignment. Constructed from incidence relations in the Delaunay tetrahedralization, alignments of the sequential representation describe structural similarities that cannot be expressed with rigid-body superposition and complement existing techniques minimizing root-mean-squared distance through superposition. Restricting to the largest substructure superimposable by a single rigid-body transformation determines an alignment suitable for root-mean-squared distance comparisons and visualization. Restricted alignments of a test set of histones and histone-like proteins determined superpositions nearly identical to those produced by the established structure alignment routines of DaliLite and ProSup. Alignment of three, increasingly complex proteins: ferredoxin, cytidine deaminase, and carbamoyl phosphate synthetase, to themselves, demonstrated previously identified regions of self-similarity. All-against-all similarity index comparisons performed on a test set of 45 class I and class II aminoacyl-tRNA synthetases closely reproduced the results of established distance matrix methods while requiring 1/16 the time. Principal component analysis of pairwise tetrahedral decomposition similarity of 2300 molecular dynamics snapshots of tryptophanyl-tRNA synthetase revealed discrete microstates within the trajectory consistent with experimental results. The method produces results with sufficient efficiency for large-scale multiple structure alignment and is well suited to genomic and evolutionary investigations where no geometric model of similarity is known a priori.
Collapse
Affiliation(s)
- Jeffrey Roach
- Department of Biochemistry and Biophysics, University of North Carolina, Chapel Hill, North Carolina 27599, USA.
| | | | | | | |
Collapse
|
28
|
Qiu J, Elber R. SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 2006; 62:881-91. [PMID: 16385554 DOI: 10.1002/prot.20854] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino acid sequence information, secondary structure and solvent accessibility information of a position are used to derive substitution scores and position-specific gap penalties. In a test set of CASP5 targets, SSALN outperforms sequence alignment methods such as a Smith-Waterman algorithm with BLOSUM50 and PSI_BLAST. SSALN also generates better alignments than PSI_BLAST in the CASP6 test set. LOOPP server prediction based on an SSALN alignment is ranked the best for target T0280_1 in CASP6. SSALN is also compared with several threading methods and sequence alignment methods on the ProSup benchmark. SSALN has the highest alignment accuracy among the methods compared. On the Fischer's benchmark, SSALN performs better than CLUSTALW and GenTHREADER, and generates more alignments with accuracy >50%, >60% or >70% than FUGUE, but fewer alignments with accuracy >80% than FUGUE. All the supplemental materials can be found at http://www.cs.cornell.edu/ approximately jianq/research.htm.
Collapse
Affiliation(s)
- Jian Qiu
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
29
|
Binkowski TA, Joachimiak A, Liang J. Protein surface analysis for function annotation in high-throughput structural genomics pipeline. Protein Sci 2006; 14:2972-81. [PMID: 16322579 PMCID: PMC2253251 DOI: 10.1110/ps.051759005] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Structural genomics (SG) initiatives are expanding the universe of protein fold space by rapidly determining structures of proteins that were intentionally selected on the basis of low sequence similarity to proteins of known structure. Often these proteins have no associated biochemical or cellular functions. The SG success has resulted in an accelerated deposition of novel structures. In some cases the structural bioinformatics analysis applied to these novel structures has provided specific functional assignment. However, this approach has also uncovered limitations in the functional analysis of uncharacterized proteins using traditional sequence and backbone structure methodologies. A novel method, named pvSOAR (pocket and void Surface of Amino Acid Residues), of comparing the protein surfaces of geometrically defined pockets and voids was developed. pvSOAR was able to detect previously unrecognized and novel functional relationships between surface features of proteins. In this study, pvSOAR is applied to several structural genomics proteins. We examined the surfaces of YecM, BioH, and RpiB from Escherichia coli as well as the CBS domains from inosine-5'-monosphate dehydrogenase from Streptococcus pyogenes, conserved hypothetical protein Ta549 from Thermoplasm acidophilum, and CBS domain protein mt1622 from Methanobacterium thermoautotrophicum with the goal to infer information about their biochemical function.
Collapse
Affiliation(s)
- T Andrew Binkowski
- Department of Bioengineering, The University of Illinois, 851 South Morgan St., Room 218, Chicago, IL 60607, USA.
| | | | | |
Collapse
|
30
|
Scheeff ED, Bourne PE. Structural evolution of the protein kinase-like superfamily. PLoS Comput Biol 2005; 1:e49. [PMID: 16244704 PMCID: PMC1261164 DOI: 10.1371/journal.pcbi.0010049] [Citation(s) in RCA: 185] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2005] [Accepted: 09/08/2005] [Indexed: 11/19/2022] Open
Abstract
The protein kinase family is large and important, but it is only one family in a larger superfamily of homologous kinases that phosphorylate a variety of substrates and play important roles in all three superkingdoms of life. We used a carefully constructed structural alignment of selected kinases as the basis for a study of the structural evolution of the protein kinase-like superfamily. The comparison of structures revealed a "universal core" domain consisting only of regions required for ATP binding and the phosphotransfer reaction. Remarkably, even within the universal core some kinase structures display notable changes, while still retaining essential activity. Hence, the protein kinase-like superfamily has undergone substantial structural and sequence revision over long evolutionary timescales. We constructed a phylogenetic tree for the superfamily using a novel approach that allowed for the combination of sequence and structure information into a unified quantitative analysis. When considered against the backdrop of species distribution and other metrics, our tree provides a compelling scenario for the development of the various kinase families from a shared common ancestor. We propose that most of the so-called "atypical kinases" are not intermittently derived from protein kinases, but rather diverged early in evolution to form a distinct phyletic group. Within the atypical kinases, the aminoglycoside and choline kinase families appear to share the closest relationship. These two families in turn appear to be the most closely related to the protein kinase family. In addition, our analysis suggests that the actin-fragmin kinase, an atypical protein kinase, is more closely related to the phosphoinositide-3 kinase family than to the protein kinase family. The two most divergent families, alpha-kinases and phosphatidylinositol phosphate kinases (PIPKs), appear to have distinct evolutionary histories. While the PIPKs probably have an evolutionary relationship with the rest of the kinase superfamily, the relationship appears to be very distant (and perhaps indirect). Conversely, the alpha-kinases appear to be an exception to the scenario of early divergence for the atypical kinases: they apparently arose relatively recently in eukaryotes. We present possible scenarios for the derivation of the alpha-kinases from an extant kinase fold.
Collapse
Affiliation(s)
- Eric D Scheeff
- San Diego Supercomputer Center, University of California, San Diego, California, United States of America.
| | | |
Collapse
|
31
|
Abstract
YAKUSA is a program designed for rapid scanning of a structural database with a query protein structure. It searches for the longest common substructures called SHSPs (structural high-scoring pairs) existing between a query structure and every structure in the structural database. It makes use of protein backbone internal coordinates (alpha angles) in order to describe protein structures as sequences of symbols. The structural similarities are established in 5 steps, the first 3 being analogous to those used in BLAST: (1) building up a deterministic finite automaton describing all patterns identical or similar to those in the query structure; (2) searching for all these patterns in every structure in the database; (3) extending the patterns to longer matching substructures (i.e., SHSPs); (4) selecting compatible SHSPs for each query-database structure pair; and (5) ranking the query-database structure pairs using 3 scores based on SHSP similarity, on SHSP probabilities, and on spatial compatibility of SHSPs. Structural fragment probabilities are estimated according to a mixture transition distribution model, which is an approximation of a high-order Markov chain model. With regard to sensitivity and selectivity of the structural matches, YAKUSA compares well to the best related programs, although it is by far faster: A typical database scan takes about 40 s CPU time on a desktop personal computer. It has also been implemented on a Web server for real-time searches.
Collapse
|
32
|
Abstract
MOTIVATION Multiple sequence alignment at the level of whole proteomes requires a high degree of automation, precluding the use of traditional validation methods such as manual curation. Since evolutionary models are too general to describe the history of each residue in a protein family, there is no single algorithm/model combination that can yield a biologically or evolutionarily optimal alignment. We propose a 'shotgun' strategy where many different algorithms are used to align the same family, and the best of these alignments is then chosen with a reliable objective function. We present WOOF, a novel 'word-oriented' objective function that relies on the identification and scoring of conserved amino acid patterns (words) between pairs of sequences. RESULTS Tests on a subset of reference protein alignments from BAliBASE showed that WOOF tended to rank the (manually curated) reference alignment highest among 1060 alternative (automatically generated) alignments for a majority of protein families. Among the automated alignments, there was a strong positive relationship between the WOOF score and similarity to the reference alignment. The speed of WOOF and its independence from explicit considerations of three-dimensional structure make it an excellent tool for analyzing large numbers of protein families. AVAILABILITY On request from the authors.
Collapse
Affiliation(s)
- Robert G Beiko
- ARC Centre in Bioinformatics and Institute for Molecular Bioscience, The University of Queensland Brisbane, Qld 4072, Australia
| | | | | |
Collapse
|
33
|
|
34
|
Abstract
Comparison of two protein structures often results in not only a global alignment but also a number of distinct local alignments; the latter, referred to as alternative alignments, are however usually ignored in existing protein structure comparison analyses. Here, we used a novel method of protein structure comparison to extensively identify and characterize the alternative alignments obtained for structure pairs of a fold classification database. We showed that all alternative alignments can be classified into one of just a few types, and with which illustrated the potential of using alternative alignments to identify recurring protein substructures, including the internal structural repeats of a protein. Furthermore, we showed that among the alternative alignments obtained, permuted alignments, which included both circular and scrambled permutations, are as prevalent as topological alignments. These results demonstrated that the so far largely unattended alternative alignments of protein structures have implications and applications for research of protein classification and evolution.
Collapse
Affiliation(s)
- Edward S C Shih
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
35
|
Nozaki Y, Bellgard M. Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties. Bioinformatics 2004; 21:1421-8. [PMID: 15591359 DOI: 10.1093/bioinformatics/bti198] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Although pairwise sequence alignment is essential in comparative genomic sequence analysis, it has proven difficult to precisely determine the gap penalties for a given pair of sequences. A common practice is to employ default penalty values. However, there are a number of problems associated with using gap penalties. First, alignment results can vary depending on the gap penalties, making it difficult to explore appropriate parameters. Second, the statistical significance of an alignment score is typically based on a theoretical model of non-gapped alignments, which may be misleading. Finally, there is no way to control the number of gaps for a given pair of sequences, even if the number of gaps is known in advance. RESULTS In this paper, we develop and evaluate the performance of an alignment technique that allows the researcher to assign a priori set of the number of allowable gaps, rather than using gap penalties. We compare this approach with the Smith-Waterman and Needleman-Wunsch techniques on a set of structurally aligned protein sequences. We demonstrate that this approach outperforms the other techniques, especially for short sequences (56-133 residues) with low similarity (<25%). Further, by employing a statistical measure, we show that it can be used to assess the quality of the alignment in relation to the true alignment with the associated optimal number of gaps. AVAILABILITY The implementation of the described methods SANK_AL is available at http://cbbc.murdoch.edu.au/ CONTACT matthew@cbbc.murdoch.edu.au.
Collapse
Affiliation(s)
- Yasuyuki Nozaki
- Centre for Bioinformatics and Biological Computing, Murdoch University, Murdoch, WA 6150, Australia
| | | |
Collapse
|
36
|
Shapiro J, Brutlag D. FoldMiner and LOCK 2: protein structure comparison and motif discovery on the web. Nucleic Acids Res 2004; 32:W536-41. [PMID: 15215444 PMCID: PMC441527 DOI: 10.1093/nar/gkh389] [Citation(s) in RCA: 38] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The FoldMiner web server (http://foldminer.stanford.edu/) provides remote access to methods for protein structure alignment and unsupervised motif discovery. FoldMiner is unique among such algorithms in that it improves both the motif definition and the sensitivity of a structural similarity search by combining the search and motif discovery methods and using information from each process to enhance the other. In a typical run, a query structure is aligned to all structures in one of several databases of single domain targets in order to identify its structural neighbors and to discover a motif that is the basis for the similarity among the query and statistically significant targets. This process is fully automated, but options for manual refinement of the results are available as well. The server uses the Chime plugin and customized controls to allow for visualization of the motif and of structural superpositions. In addition, we provide an interface to the LOCK 2 algorithm for rapid alignments of a query structure to smaller numbers of user-specified targets.
Collapse
Affiliation(s)
- Jessica Shapiro
- Biophysics Program, Stanford University School of Medicine, Stanford, CA 94305-5307, USA
| | | |
Collapse
|
37
|
Kolodny R, Linial N. Approximate protein structural alignment in polynomial time. Proc Natl Acad Sci U S A 2004; 101:12201-6. [PMID: 15304646 PMCID: PMC514457 DOI: 10.1073/pnas.0404383101] [Citation(s) in RCA: 87] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2003] [Indexed: 11/18/2022] Open
Abstract
Alignment of protein structures is a fundamental task in computational molecular biology. Good structural alignments can help detect distant evolutionary relationships that are hard or impossible to discern from protein sequences alone. Here, we study the structural alignment problem as a family of optimization problems and develop an approximate polynomial-time algorithm to solve them. For a commonly used scoring function, the algorithm runs in O(n(10)/epsilon(6)) time, for globular protein of length n, and it detects alignments that score within an additive error of epsilon from all optima. Thus, we prove that this task is computationally feasible, although the method that we introduce is too slow to be a useful everyday tool. We argue that such approximate solutions are, in fact, of greater interest than exact ones because of the noisy nature of experimentally determined protein coordinates. The measurement of similarity between a pair of protein structures used by our algorithm involves the Euclidean distance between the structures (appropriately rigidly transformed). We show that an alternative approach, which relies on internal distance matrices, must incorporate sophisticated geometric ingredients if it is to guarantee optimality and run in polynomial time. We use these observations to visualize the scoring function for several real instances of the problem. Our investigations yield insights on the computational complexity of protein alignment under various scoring functions. These insights can be used in the design of scoring functions for which the optimum can be approximated efficiently and perhaps in the development of efficient algorithms for the multiple structural alignment problem.
Collapse
Affiliation(s)
- Rachel Kolodny
- Departments of Computer Science and Structural Biology, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
38
|
Shapiro J, Brutlag D. FoldMiner: structural motif discovery using an improved superposition algorithm. Protein Sci 2004; 13:278-94. [PMID: 14691242 PMCID: PMC2286532 DOI: 10.1110/ps.03239404] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We report an unsupervised structural motif discovery algorithm, FoldMiner, which is able to detect global and local motifs in a database of proteins without the need for multiple structure or sequence alignments and without relying on prior classification of proteins into families. Motifs, which are discovered from pairwise superpositions of a query structure to a database of targets, are described probabilistically in terms of the conservation of each secondary structure element's position and are used to improve detection of distant structural relationships. During each iteration of the algorithm, the motif is defined from the current set of homologs and is used both to recruit additional homologous structures and to discard false positives. FoldMiner thus achieves high specificity and sensitivity by distinguishing between homologous and nonhomologous structures by the regions of the query to which they align. We find that when two proteins of the same fold are aligned, highly conserved secondary structure elements in one protein tend to align to highly conserved elements in the second protein, suggesting that FoldMiner consistently identifies the same motif in members of a fold. Structural alignments are performed by an improved superposition algorithm, LOCK 2, which detects distant structural relationships by placing increased emphasis on the alignment of secondary structure elements. LOCK 2 obeys several properties essential in automated analysis of protein structure: It is symmetric, its alignments of secondary structure elements are transitive, its alignments of residues display a high degree of transitivity, and its scoring system is empirically found to behave as a metric.
Collapse
Affiliation(s)
- Jessica Shapiro
- Biophysics Program and Department of Biochemistry, Stanford University, Stanford, California 94305-5307, USA
| | | |
Collapse
|
39
|
Przybylski D, Rost B. Improving Fold Recognition Without Folds. J Mol Biol 2004; 341:255-69. [PMID: 15312777 DOI: 10.1016/j.jmb.2004.05.041] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2004] [Revised: 05/18/2004] [Accepted: 05/18/2004] [Indexed: 11/21/2022]
Abstract
The most reliable way to align two proteins of unknown structure is through sequence-profile and profile-profile alignment methods. If the structure for one of the two is known, fold recognition methods outperform purely sequence-based alignments. Here, we introduced a novel method that aligns generalised sequence and predicted structure profiles. Using predicted 1D structure (secondary structure and solvent accessibility) significantly improved over sequence-only methods, both in terms of correctly recognising pairs of proteins with different sequences and similar structures and in terms of correctly aligning the pairs. The scores obtained by our generalised scoring matrix followed an extreme value distribution; this yielded accurate estimates of the statistical significance of our alignments. We found that mistakes in 1D structure predictions correlated between proteins from different sequence-structure families. The impact of this surprising result was that our method succeeded in significantly out-performing sequence-only methods even without explicitly using structural information from any of the two. Since AGAPE also outperformed established methods that rely on 3D information, we made it available through. If we solved the problem of CPU-time required to apply AGAPE on millions of proteins, our results could also impact everyday database searches.
Collapse
Affiliation(s)
- Dariusz Przybylski
- CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA.
| | | |
Collapse
|
40
|
Koike R, Kinoshita K, Kidera A. Probabilistic description of protein alignments for sequences and structures. Proteins 2004; 56:157-66. [PMID: 15162495 DOI: 10.1002/prot.20067] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
A number of equally optimal alignments inherently exist in the sequence and structure comparisons among proteins. To represent the sub-optimal alignments systematically, we have developed a method of generating probabilistic alignments for sequences and structures, by which the correspondence between pairs of residues is evaluated in a probabilistic manner. Our method uses the periodic boundary condition to avoid the entropy artifact favoring full-length matches. In the structure comparison, the environmental effects are incorporated by the mean-field approximation. We applied this method in comparisons of two pairs of proteins with internal symmetry; the first set were proteins of TIM-barrel fold and the second were beta-trefoil fold. These pairs are expected to have distinct sub-optimal alignments suitable for probabilistic description with the periodic boundary. It was shown that the sequence and structure alignments are consistent with each other and that the alignments with the highest probability represent circular permutation.
Collapse
Affiliation(s)
- Ryotaro Koike
- Department of Chemistry, Graduate School of Science, Kyoto University, Kitashirakawa-Oiwake-cho, Sakyo-ku, Kyoto 606-8502, Japan
| | | | | |
Collapse
|
41
|
Constans P. On the functional significance of electron density protein structure alignments. Proteins 2004; 55:646-55. [PMID: 15103628 DOI: 10.1002/prot.20059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Electron density protein alignments are analyzed in terms of their underlying similarity measure, the density overlap. These alignments are conceptually unrelated to biochemical structural elements and, therefore, are appropriate in structure-only similarity studies. The analysis is focused on the low sequence similarity subset of protein domains. A remarkable association is found between simple, density overlap measures and the expert designed Structural Classification of Proteins (SCOP) for which functional and evolutive analogies prevail. The association found validates the functional significance of electron density alignments.
Collapse
Affiliation(s)
- Pere Constans
- Department of Chemistry, Rice University, Houston, Texas, USA.
| |
Collapse
|
42
|
Abstract
The structural comparison of two proteins comes up in many applications in structural biology where it is often necessary to find similarities in very large conformation sets. This work describes techniques to achieve significant speedup in the computation of structural similarity between two given conformations, at the expense of introducing a small error in the similarity measure. Furthermore, the proposed computational scheme allows for a tradeoff between speedup and error. This scheme exploits the fact that the Calpha representation of a protein conformation contains redundant information, due to the chain topology and limited compactness of proteins. This redundancy can be reduced by approximating subchains of a protein by their centers of mass, resulting in a smaller number of points to describe a conformation. A Haar wavelet analysis of random chains and proteins is used to justify this approximated representation. Similarity measures computed with this representation are highly correlated to the measures computed with the original Calpha representation. Therefore, they can be used in applications where small similarity errors can be tolerated or as fast filters in applications that require exact measures. Computational tests have been conducted on two applications, nearest neighbor search and automatic structural classification.
Collapse
Affiliation(s)
- Itay Lotan
- Department of Computer Science, 353 Serra Mall, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
43
|
Ochagavía ME, Wodak S. Progressive combinatorial algorithm for multiple structural alignments: Application to distantly related proteins. Proteins 2004; 55:436-54. [PMID: 15048834 DOI: 10.1002/prot.10587] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
MALECON is a progressive combinatorial procedure for multiple alignments of protein structures. It searches a library of pairwise alignments for all three-protein alignments in which a specified number of residues is consistently aligned. These alignments are progressively expanded to include additional proteins and more spatially equivalent residues, subject to certain criteria. This action involves superimposing the aligned proteins by their hitherto equivalent residues and searching for additional Calpha atoms that lie close in space. The performance of MALECON is illustrated and compared with several extant multiple structure alignment methods by using as test the globin homologous superfamily, the OB and the Jellyrolls folds. MALECON gives better definitions of the common structural features in the structurally more diverse proteins of the OB and Jellyrolls folds, but it yields comparable results for the more similar globins. When no consistent multiple alignments can be derived for all members of a protein group, our procedure is still capable of automatically generating consistent alignments and common core definitions for subgroups of the members. This finding is illustrated for proteins of the OB fold and SH3 domains, believed to share common structural features, and should be very instrumental in homology modeling and investigations of protein evolution.
Collapse
|
44
|
Raghava GPS, Searle SMJ, Audley PC, Barber JD, Barton GJ. OXBench: a benchmark for evaluation of protein multiple sequence alignment accuracy. BMC Bioinformatics 2003; 4:47. [PMID: 14552658 PMCID: PMC280650 DOI: 10.1186/1471-2105-4-47] [Citation(s) in RCA: 155] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2003] [Accepted: 10/10/2003] [Indexed: 11/10/2022] Open
Abstract
Background The alignment of two or more protein sequences provides a powerful guide in the prediction of the protein structure and in identifying key functional residues, however, the utility of any prediction is completely dependent on the accuracy of the alignment. In this paper we describe a suite of reference alignments derived from the comparison of protein three-dimensional structures together with evaluation measures and software that allow automatically generated alignments to be benchmarked. We test the OXBench benchmark suite on alignments generated by the AMPS multiple alignment method, then apply the suite to compare eight different multiple alignment algorithms. The benchmark shows the current state-of-the art for alignment accuracy and provides a baseline against which new alignment algorithms may be judged. Results The simple hierarchical multiple alignment algorithm, AMPS, performed as well as or better than more modern methods such as CLUSTALW once the PAM250 pair-score matrix was replaced by a BLOSUM series matrix. AMPS gave an accuracy in Structurally Conserved Regions (SCRs) of 89.9% over a set of 672 alignments. The T-COFFEE method on a data set of families with <8 sequences gave 91.4% accuracy, significantly better than CLUSTALW (88.9%) and all other methods considered here. The complete suite is available from . Conclusions The OXBench suite of reference alignments, evaluation software and results database provide a convenient method to assess progress in sequence alignment techniques. Evaluation measures that were dependent on comparison to a reference alignment were found to give good discrimination between methods. The STAMP Sc Score which is independent of a reference alignment also gave good discrimination. Application of OXBench in this paper shows that with the exception of T-COFFEE, the majority of the improvement in alignment accuracy seen since 1985 stems from improved pair-score matrices rather than algorithmic refinements. The maximum theoretical alignment accuracy obtained by pooling results over all methods was 94.5% with 52.5% accuracy for alignments in the 0–10 percentage identity range. This suggests that further improvements in accuracy will be possible in the future.
Collapse
Affiliation(s)
- GPS Raghava
- European Molecular Biology Laboratory: European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- University of Oxford, Laboratory of Molecular Biophysics, Rex Richards Building, South Parks Road, Oxford, OX1 3QU, UK
- Bioinformatics Centre, Institute of Microbial Technology, Sector 39A, Chandigarh, India
| | - Stephen MJ Searle
- European Molecular Biology Laboratory: European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK
| | - Patrick C Audley
- School of Life Sciences, University of Dundee, Dow St., Dundee, DD1 5EH, Scotland, UK
| | - Jonathan D Barber
- School of Life Sciences, University of Dundee, Dow St., Dundee, DD1 5EH, Scotland, UK
| | - Geoffrey J Barton
- School of Life Sciences, University of Dundee, Dow St., Dundee, DD1 5EH, Scotland, UK
- European Molecular Biology Laboratory: European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK
- University of Oxford, Laboratory of Molecular Biophysics, Rex Richards Building, South Parks Road, Oxford, OX1 3QU, UK
| |
Collapse
|
45
|
Abstract
We present the LGA (Local-Global Alignment) method, designed to facilitate the comparison of protein structures or fragments of protein structures in sequence dependent and sequence independent modes. The LGA structure alignment program is available as an online service at http://PredictionCenter.llnl.gov/local/lga. Data generated by LGA can be successfully used in a scoring function to rank the level of similarity between two structures and to allow structure classification when many proteins are being analyzed. LGA also allows the clustering of similar fragments of protein structures.
Collapse
Affiliation(s)
- Adam Zemla
- Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA 94550, USA.
| |
Collapse
|
46
|
Van Walle I, Lasters I, Wyns L. Consistency matrices: quantified structure alignments for sets of related proteins. Proteins 2003; 51:1-9. [PMID: 12596259 DOI: 10.1002/prot.10293] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Comparing two remotely similar structures is a difficult problem: more often than not, resulting structure alignments will show ambiguities and a unique answer usually does not even exist. In addition, alignments in general have a limited information content because every aligned residue is considered equally important. To solve these issues to a certain extent, one can take the perspective of a whole group of similar structures and then evaluate common structural features. Here, we describe a consistency approach that, although not actually performing a multiple structure alignment, does produce the information that one would conceivably want from such an experiment: the key structural features of the group, e.g., a fold, which in this case are projected onto either a pair of proteins or a single protein. Both representations are useful for a number of applications, ranging from the detection of (partially) wrong structure alignments to protein structure classification and fold recognition. To demonstrate some of these applications, the procedure was applied to 195 SCOP folds containing a total of 1802 domains sharing very low sequence similarity.
Collapse
Affiliation(s)
- Ivo Van Walle
- Department of Ultrastructure, Vrije Universiteit Brussel, Sint-Genesius Rode, Belgium.
| | | | | |
Collapse
|
47
|
Wallin S, Farwer J, Bastolla U. Testing similarity measures with continuous and discrete protein models. Proteins 2003; 50:144-57. [PMID: 12471607 DOI: 10.1002/prot.10271] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
There are many ways to define the distance between two protein structures, thus assessing their similarity. Here, we investigate and compare the properties of five different distance measures, including the standard root-mean-square deviation (cRMSD). The performance of these measures is studied from different perspectives with two different protein models, one continuous and the other discrete. Using the continuous model, we examine the correlation between energy and native distance, and the ability of the different measures to discriminate between the two possible topologies of a three-helix bundle. Using the discrete model, we perform fits to real protein structures by minimizing different distance measures. The properties of the fitted structures are found to depend strongly on the distance measure used and the scale considered. We find that the cRMSD measure very effectively describes long-range features but is less effective with short-range features, and it correlates weakly with energy. A stronger correlation with energy and a better description of short-range properties is obtained when we use measures based on intramolecular distances.
Collapse
Affiliation(s)
- Stefan Wallin
- Complex Systems Division, Department of Theoretical Physics, Lund University, Sölvegatan 14A, SE-223 62 Lund, Sweden.
| | | | | |
Collapse
|
48
|
Krebs WG, Tsai J, Alexandrov V, Junker J, Jansen R, Gerstein M. Tools and Databases to Analyze Protein Flexibility; Approaches to Mapping Implied Features onto Sequences. Methods Enzymol 2003; 374:544-84. [PMID: 14696388 DOI: 10.1016/s0076-6879(03)74023-3] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Affiliation(s)
- W G Krebs
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California 92093, USA
| | | | | | | | | | | |
Collapse
|
49
|
Constans P. Linear scaling approaches to quantum macromolecular similarity: evaluating the similarity function. J Comput Chem 2002; 23:1305-13. [PMID: 12214313 DOI: 10.1002/jcc.10140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The evaluation of the electron density based similarity function scales quadratically with respect to the size of the molecules for simplified, atomic shell densities. Due to the exponential decay of the function's atom-atom terms most interatomic contributions are numerically negligible on large systems. An improved algorithm for the evaluation of the Quantum Molecular Similarity function is presented. This procedure identifies all non-negligible terms without computing unnecessary interatomic squared distances, thus effectively turning to linear scaling the similarity evaluation. Presented also is a minimalist dynamic electron density model. Approximate, single shell densities together with the proposed algorithm facilitate fast electron density based alignments on macromolecules.
Collapse
Affiliation(s)
- Pere Constans
- Department of Chemistry, Rice University, Houston, Texas 77005-1892, USA.
| |
Collapse
|
50
|
Hill EE, Morea V, Chothia C. Sequence conservation in families whose members have little or no sequence similarity: the four-helical cytokines and cytochromes. J Mol Biol 2002; 322:205-33. [PMID: 12215425 DOI: 10.1016/s0022-2836(02)00653-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Proteins for which there are good structural, functional and genetic similarities that imply a common evolutionary origin, can have sequences whose similarities are low or undetectable by conventional sequence comparison procedures. Do these proteins have sequence conservation beyond the simple conservation of hydrophobic and hydrophilic character at specific sites and if they do what is its nature? To answer these questions we have analysed the structures and sequences of two superfamilies: the four-helical cytokines and cytochromes c'-b(562). Members of these superfamilies have sequence similarities that are either very low or not detectable. The cytokine superfamily has within it a long chain family and a short chain family. The sequences of known representative structures of the two families were aligned using structural information. From these alignments we identified the regions that conserve the same main-chain conformation: the common core (CC). For members of the same family, the CC comprises some 50% of the individual structures; for the combination of both families it is 30%. We added homologous sequences to the structural alignment. Analysis of the residues occurring at sites within the CCs showed that 30% have little or no conservation, whereas about 40% conserve the polar/neutral or hydrophobic/neutral character of their residues. The remaining 30% conserve hydrophobic residues with strong or medium limitations on their volume variations. Almost all of these residues are found at sites that form the "buried spine" of each helix (at sites i, i+3, i+7, i+10, etc., or i, i+4, i+7, i+11, etc.) and they pack together at the centre of each structure to give a pattern of residue-residue contacts that is almost absolutely conserved. These CC conserved hydrophobic residues form only 10-15% of all the residues in the individual structures.A similar analysis of the cytochromes c'-b(562), which bind haem and have a very different function to that of the cytokines, gave very similar results. Again some 30% of the CC residues have hydrophobic residues with strong or medium conservation. Most of these form the buried spine of each helix and play the same role as those in the cytokines. The others, and some spine residues bind the haem co-factor.
Collapse
Affiliation(s)
- Emma E Hill
- MRC Laboratory of Molecular Biology, Cambridge, UK.
| | | | | |
Collapse
|