1
|
Joung I, Kim JY, Joo K, Lee J. Non-sequential protein structure alignment by conformational space annealing and local refinement. PLoS One 2019; 14:e0210177. [PMID: 30699145 PMCID: PMC6353097 DOI: 10.1371/journal.pone.0210177] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Accepted: 12/18/2018] [Indexed: 11/18/2022] Open
Abstract
Protein structure alignment is an important tool for studying evolutionary biology and protein modeling. A tool which intensively searches for the globally optimal non-sequential alignments is rarely found. We propose ALIGN-CSA which shows improvement in scores, such as DALI-score, SP-score, SO-score and TM-score over the benchmark set including 286 cases. We performed benchmarking of existing popular alignment scoring functions, where the dependence of the search algorithm was effectively eliminated by using ALIGN-CSA. For the benchmarking, we set the minimum block size to 4 to prevent much fragmented alignments where the biological relevance of small alignment blocks is hard to interpret. With this condition, globally optimal alignments were searched by ALIGN-CSA using the four scoring functions listed above, and TM-score is found to be the most effective in generating alignments with longer match lengths and smaller RMSD values. However, DALI-score is the most effective in generating alignments similar to the manually curated reference alignments, which implies that DALI-score is more biologically relevant score. Due to the high demand on computational resources of ALIGN-CSA, we also propose a relatively fast local refinement method, which can control the minimum block size and whether to allow the reverse alignment. ALIGN-CSA can be used to obtain much improved alignment at the cost of relatively more extensive computation. For faster alignment, we propose a refinement protocol that improves the score of a given alignment obtained by various external tools. All programs are available from http://lee.kias.re.kr.
Collapse
Affiliation(s)
- InSuk Joung
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
| | - Jong Yun Kim
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
| | - Keehyoung Joo
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, Korea
| | - Jooyoung Lee
- Center for In Silico Protein Science, Korea Institute for Advanced Study, Seoul, Korea
- School of Computational Sciences, Korea Institute for Advanced Study, Seoul, Korea
- Center for Advanced Computation, Korea Institute for Advanced Study, Seoul, Korea
- * E-mail:
| |
Collapse
|
2
|
Brown P, Pullan W, Yang Y, Zhou Y. Fast and accurate non-sequential protein structure alignment using a new asymmetric linear sum assignment heuristic. Bioinformatics 2015; 32:370-7. [PMID: 26454279 DOI: 10.1093/bioinformatics/btv580] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2015] [Accepted: 10/04/2015] [Indexed: 01/24/2023] Open
Abstract
MOTIVATION The three dimensional tertiary structure of a protein at near atomic level resolution provides insight alluding to its function and evolution. As protein structure decides its functionality, similarity in structure usually implies similarity in function. As such, structure alignment techniques are often useful in the classifications of protein function. Given the rapidly growing rate of new, experimentally determined structures being made available from repositories such as the Protein Data Bank, fast and accurate computational structure comparison tools are required. This paper presents SPalignNS, a non-sequential protein structure alignment tool using a novel asymmetrical greedy search technique. RESULTS The performance of SPalignNS was evaluated against existing sequential and non-sequential structure alignment methods by performing trials with commonly used datasets. These benchmark datasets used to gauge alignment accuracy include (i) 9538 pairwise alignments implied by the HOMSTRAD database of homologous proteins; (ii) a subset of 64 difficult alignments from set (i) that have low structure similarity; (iii) 199 pairwise alignments of proteins with similar structure but different topology; and (iv) a subset of 20 pairwise alignments from the RIPC set. SPalignNS is shown to achieve greater alignment accuracy (lower or comparable root-mean squared distance with increased structure overlap coverage) for all datasets, and the highest agreement with reference alignments from the challenging dataset (iv) above, when compared with both sequentially constrained alignments and other non-sequential alignments. AVAILABILITY AND IMPLEMENTATION SPalignNS was implemented in C++. The source code, binary executable, and a web server version is freely available at: http://sparks-lab.org CONTACT yaoqi.zhou@griffith.edu.au.
Collapse
Affiliation(s)
- Peter Brown
- School of ICT, Griffith University, Gold Coast, QLD 4222, Australia
| | - Wayne Pullan
- School of ICT, Griffith University, Gold Coast, QLD 4222, Australia
| | - Yuedong Yang
- Institute for Glycomics, Griffith University, Gold Coast, QLD 4222, Australia
| | - Yaoqi Zhou
- School of ICT, Griffith University, Gold Coast, QLD 4222, Australia Institute for Glycomics, Griffith University, Gold Coast, QLD 4222, Australia
| |
Collapse
|
3
|
Minami S, Sawada K, Chikenji G. How a spatial arrangement of secondary structure elements is dispersed in the universe of protein folds. PLoS One 2014; 9:e107959. [PMID: 25243952 PMCID: PMC4171485 DOI: 10.1371/journal.pone.0107959] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2014] [Accepted: 08/18/2014] [Indexed: 11/18/2022] Open
Abstract
It has been known that topologically different proteins of the same class sometimes share the same spatial arrangement of secondary structure elements (SSEs). However, the frequency by which topologically different structures share the same spatial arrangement of SSEs is unclear. It is important to estimate this frequency because it provides both a deeper understanding of the geometry of protein folds and a valuable suggestion for predicting protein structures with novel folds. Here we clarified the frequency with which protein folds share the same SSE packing arrangement with other folds, the types of spatial arrangement of SSEs that are frequently observed across different folds, and the diversity of protein folds that share the same spatial arrangement of SSEs with a given fold, using a protein structure alignment program MICAN, which we have been developing. By performing comprehensive structural comparison of SCOP fold representatives, we found that approximately 80% of protein folds share the same spatial arrangement of SSEs with other folds. We also observed that many protein pairs that share the same spatial arrangement of SSEs belong to the different classes, often with an opposing N- to C-terminal direction of the polypeptide chain. The most frequently observed spatial arrangement of SSEs was the 2-layer α/β packing arrangement and it was dispersed among as many as 27% of SCOP fold representatives. These results suggest that the same spatial arrangements of SSEs are adopted by a wide variety of different folds and that the spatial arrangement of SSEs is highly robust against the N- to C-terminal direction of the polypeptide chain.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Complex Systems Science, Nagoya University, Nagoya, Aichi, Japan
| | - Kengo Sawada
- Department of Applied Physics, Nagoya University, Nagoya, Aichi, Japan
| | - George Chikenji
- Department of Computational Science and Engineering, Nagoya University, Nagoya, Aichi, Japan
- * E-mail:
| |
Collapse
|
4
|
Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 2013; 14:24. [PMID: 23331634 PMCID: PMC3637537 DOI: 10.1186/1471-2105-14-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/08/2013] [Indexed: 11/10/2022] Open
Abstract
Background Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. Results We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. Conclusions MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Computational Science and Engineering, Nagoya University, Nagoya 464-8603, Japan
| | | | | |
Collapse
|
5
|
Daniluk P, Lesyng B. A novel method to compare protein structures using local descriptors. BMC Bioinformatics 2011; 12:344. [PMID: 21849047 PMCID: PMC3179968 DOI: 10.1186/1471-2105-12-344] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 08/17/2011] [Indexed: 11/15/2022] Open
Abstract
Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at http://bioexploratorium.pl/EP/DEDAL.
Collapse
Affiliation(s)
- Paweł Daniluk
- Faculty of Physics, Department of Biophysics and CoE BioExploratorium, University of Warsaw, Żwirki i Wigury 93, Warsaw, Poland
| | | |
Collapse
|
6
|
Nguyen MN, Tan KP, Madhusudhan MS. CLICK--topology-independent comparison of biomolecular 3D structures. Nucleic Acids Res 2011; 39:W24-8. [PMID: 21602266 PMCID: PMC3125785 DOI: 10.1093/nar/gkr393] [Citation(s) in RCA: 100] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2011] [Revised: 04/19/2011] [Accepted: 05/03/2011] [Indexed: 01/28/2023] Open
Abstract
Our server, CLICK: http://mspc.bii.a-star.edu.sg/click, is capable of superimposing the 3D structures of any pair of biomolecules (proteins, DNA, RNA, etc.). The server makes use of the Cartesian coordinates of the molecules with the option of using other structural features such as secondary structure, solvent accessible surface area and residue depth to guide the alignment. CLICK first looks for cliques of points (3-7 residues) that are structurally similar in the pair of structures to be aligned. Using these local similarities, a one-to-one equivalence is charted between the residues of the two structures. A least square fit then superimposes the two structures. Our method is especially powerful in establishing protein relationships by detecting similarities in structural subdomains, domains and topological variants. CLICK has been extensively benchmarked and compared with other popular methods for protein and RNA structural alignments. In most cases, CLICK alignments were statistically significantly better in terms of structure overlap. The method also recognizes conformational changes that may have occurred in structural domains or subdomains in one structure with respect to the other. For this purpose, the server produces complementary alignments to maximize the extent of detectable similarity. Various examples showcase the utility of our web server.
Collapse
Affiliation(s)
- M. N. Nguyen
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| | - K. P. Tan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| | - M. S. Madhusudhan
- Bioinformatics Institute, 30 Biopolis Street, #07-01, Matrix, Singapore 138671, Department of Biological Sciences, National University of Singapore and School of Biological Sciences, Nanyang Technological University, Singapore
| |
Collapse
|
7
|
Teyra J, Hawkins J, Zhu H, Pisabarro MT. Studies on the inference of protein binding regions across fold space based on structural similarities. Proteins 2011; 79:499-508. [PMID: 21069715 DOI: 10.1002/prot.22897] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The emerging picture of a continuous protein fold space highlights the existence of non obvious structural similarities between proteins with apparent different topologies. The identification of structure resemblances across fold space and the analysis of similar recognition regions may be a valuable source of information towards protein structure-based functional characterization. In this work, we use non-sequential structural alignment methods (ns-SAs) to identify structural similarities between protein pairs independently of their SCOP hierarchy, and we calculate the significance of binding region conservation using the interacting residues overlap in the ns-SA. We cluster the binding inferences for each family to distinguish already known family binding regions from putative new ones. Our methodology exploits the enormous amount of data available in the PDB to identify binding region similarities within protein families and to propose putative binding regions. Our results indicate that there is a plethora of structurally common binding regions among proteins, independently of current fold classifications. We obtain a 6- to 8-fold enrichment of novel binding regions, and identify binding inferences for 728 protein families that so far lack binding information in the PDB. We explore binding mode analogies between ligands from commonly clustered binding regions to investigate the utility of our methodology. A comprehensive analysis of the obtained binding inferences may help in the functional characterization of protein recognition and assist rational engineering. The data obtained in this work is available in the download link at www.scowlp.org.
Collapse
Affiliation(s)
- Joan Teyra
- Structural Bioinformatics, BIOTEC, Technical University of Dresden, Tatzberg 47-51, 01307 Dresden, Germany.
| | | | | | | |
Collapse
|
8
|
Nguyen MN, Madhusudhan MS. Biological insights from topology independent comparison of protein 3D structures. Nucleic Acids Res 2011; 39:e94. [PMID: 21596786 PMCID: PMC3152366 DOI: 10.1093/nar/gkr348] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Comparing and classifying the three-dimensional (3D) structures of proteins is of crucial importance to molecular biology, from helping to determine the function of a protein to determining its evolutionary relationships. Traditionally, 3D structures are classified into groups of families that closely resemble the grouping according to their primary sequence. However, significant structural similarities exist at multiple levels between proteins that belong to these different structural families. In this study, we propose a new algorithm, CLICK, to capture such similarities. The method optimally superimposes a pair of protein structures independent of topology. Amino acid residues are represented by the Cartesian coordinates of a representative point (usually the Cα atom), side chain solvent accessibility, and secondary structure. Structural comparison is effected by matching cliques of points. CLICK was extensively benchmarked for alignment accuracy on four different sets: (i) 9537 pair-wise alignments between two structures with the same topology; (ii) 64 alignments from set (i) that were considered to constitute difficult alignment cases; (iii) 199 pair-wise alignments between proteins with similar structure but different topology; and (iv) 1275 pair-wise alignments of RNA structures. The accuracy of CLICK alignments was measured by the average structure overlap score and compared with other alignment methods, including HOMSTRAD, MUSTANG, Geometric Hashing, SALIGN, DALI, GANGSTA+, FATCAT, ARTS and SARA. On average, CLICK produces pair-wise alignments that are either comparable or statistically significantly more accurate than all of these other methods. We have used CLICK to uncover relationships between (previously) unrelated proteins. These new biological insights include: (i) detecting hinge regions in proteins where domain or sub-domains show flexibility; (ii) discovering similar small molecule binding sites from proteins of different folds and (iii) discovering topological variants of known structural/sequence motifs. Our method can generally be applied to compare any pair of molecular structures represented in Cartesian coordinates as exemplified by the RNA structure superimposition benchmark.
Collapse
Affiliation(s)
- Minh N Nguyen
- Bioinformatics Institute, 30 Biopolis Street, #07-01 Matrix, Singapore 138671
| | | |
Collapse
|
9
|
Reeder PJ, Huang YM, Dordick JS, Bystroff C. A rewired green fluorescent protein: folding and function in a nonsequential, noncircular GFP permutant. Biochemistry 2010; 49:10773-9. [PMID: 21090791 DOI: 10.1021/bi100975z] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The sequential order of secondary structural elements in proteins affects the folding and activity to an unknown extent. To test the dependence on sequential connectivity, we reconnected secondary structural elements by their solvent-exposed ends, permuting their sequential order, called "rewiring". This new protein design strategy changes the topology of the backbone without changing the core side chain packing arrangement. While circular and noncircular permutations have been observed in protein structures that are not related by sequence homology, to date no one has attempted to rationally design and construct a protein with a sequence that is noncircularly permuted while conserving three-dimensional structure. Herein, we show that green fluorescent protein can be rewired, still functionally fold, and exhibit wild-type fluorescence excitation and emission spectra.
Collapse
Affiliation(s)
- Philippa J Reeder
- Department of Chemical and Biological Engineering, University of Colorado, Boulder, Colorado 80309, United States
| | | | | | | |
Collapse
|
10
|
Martínez-Castilla LP, Rodríguez-Sotres R. A score of the ability of a three-dimensional protein model to retrieve its own sequence as a quantitative measure of its quality and appropriateness. PLoS One 2010; 5:e12483. [PMID: 20830209 PMCID: PMC2935356 DOI: 10.1371/journal.pone.0012483] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2010] [Accepted: 08/03/2010] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND Despite the remarkable progress of bioinformatics, how the primary structure of a protein leads to a three-dimensional fold, and in turn determines its function remains an elusive question. Alignments of sequences with known function can be used to identify proteins with the same or similar function with high success. However, identification of function-related and structure-related amino acid positions is only possible after a detailed study of every protein. Folding pattern diversity seems to be much narrower than sequence diversity, and the amino acid sequences of natural proteins have evolved under a selective pressure comprising structural and functional requirements acting in parallel. PRINCIPAL FINDINGS The approach described in this work begins by generating a large number of amino acid sequences using ROSETTA [Dantas G et al. (2003) J Mol Biol 332:449-460], a program with notable robustness in the assignment of amino acids to a known three-dimensional structure. The resulting sequence-sets showed no conservation of amino acids at active sites, or protein-protein interfaces. Hidden Markov models built from the resulting sequence sets were used to search sequence databases. Surprisingly, the models retrieved from the database sequences belonged to proteins with the same or a very similar function. Given an appropriate cutoff, the rate of false positives was zero. According to our results, this protocol, here referred to as Rd.HMM, detects fine structural details on the folding patterns, that seem to be tightly linked to the fitness of a structural framework for a specific biological function. CONCLUSION Because the sequence of the native protein used to create the Rd.HMM model was always amongst the top hits, the procedure is a reliable tool to score, very accurately, the quality and appropriateness of computer-modeled 3D-structures, without the need for spectroscopy data. However, Rd.HMM is very sensitive to the conformational features of the models' backbone.
Collapse
Affiliation(s)
- León P. Martínez-Castilla
- Departamento de Bioquímica–Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de México, Distrito Federal, Mexico
- Centro de Ciencias de la Complejidad, Universidad Nacional Autónoma de México, Ciudad de México, Distrito Federal, Mexico
| | - Rogelio Rodríguez-Sotres
- Departamento de Bioquímica–Facultad de Química, Universidad Nacional Autónoma de México, Ciudad de México, Distrito Federal, Mexico
- * E-mail:
| |
Collapse
|
11
|
Schmidt-Goenner T, Guerler A, Kolbeck B, Knapp EW. Circular permuted proteins in the universe of protein folds. Proteins 2009; 78:1618-30. [DOI: 10.1002/prot.22678] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
12
|
Hasegawa H, Holm L. Advances and pitfalls of protein structural alignment. Curr Opin Struct Biol 2009; 19:341-8. [PMID: 19481444 DOI: 10.1016/j.sbi.2009.04.003] [Citation(s) in RCA: 303] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Accepted: 04/16/2009] [Indexed: 11/30/2022]
Abstract
Structure comparison opens a window into the distant past of protein evolution, which has been unreachable by sequence comparison alone. With 55,000 entries in the Protein Data Bank and about 500 new structures added each week, automated processing, comparison, and classification are necessary. A variety of methods use different representations, scoring functions, and optimization algorithms, and they generate contradictory results even for moderately distant structures. Sequence mutations, insertions, and deletions are accommodated by plastic deformations of the common core, retaining the precise geometry of the active site, and peripheral regions may refold completely. Therefore structure comparison methods that allow for flexibility and plasticity generate the most biologically meaningful alignments. Active research directions include both the search for fold invariant features and the modeling of structural transitions in evolution. Advances have been made in algorithmic robustness, multiple alignment, and speeding up database searches.
Collapse
Affiliation(s)
- Hitomi Hasegawa
- Institute of Biotechnology, University of Helsinki, P.O. Box 56 (Viikinkaari 5), 00014 University of Helsinki, Finland
| | | |
Collapse
|
13
|
Stivala A, Wirth A, Stuckey PJ. Tableau-based protein substructure search using quadratic programming. BMC Bioinformatics 2009; 10:153. [PMID: 19450287 PMCID: PMC2705363 DOI: 10.1186/1471-2105-10-153] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2009] [Accepted: 05/19/2009] [Indexed: 12/13/2022] Open
Abstract
Background Searching for proteins that contain similar substructures is an important task in structural biology. The exact solution of most formulations of this problem, including a recently published method based on tableaux, is too slow for practical use in scanning a large database. Results We developed an improved method for detecting substructural similarities in proteins using tableaux. Tableaux are compared efficiently by solving the quadratic program (QP) corresponding to the quadratic integer program (QIP) formulation of the extraction of maximally-similar tableaux. We compare the accuracy of the method in classifying protein folds with some existing techniques. Conclusion We find that including constraints based on the separation of secondary structure elements increases the accuracy of protein structure search using maximally-similar subtableau extraction, to a level where it has comparable or superior accuracy to existing techniques. We demonstrate that our implementation is able to search a structural database in a matter of hours on a standard PC.
Collapse
Affiliation(s)
- Alex Stivala
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia.
| | | | | |
Collapse
|