1
|
Benchmarking Methods of Protein Structure Alignment. J Mol Evol 2020; 88:575-597. [PMID: 32725409 DOI: 10.1007/s00239-020-09960-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2020] [Accepted: 07/10/2020] [Indexed: 10/23/2022]
Abstract
The function of a protein is primarily determined by its structure and amino acid sequence. Many biological questions of interest rely on being able to accurately determine the group of structures to which domains of a protein belong; this can be done through alignment and comparison of protein structures. Dozens of different methods for Protein Structure Alignment (PSA) have been proposed that use a wide range of techniques. The aim of this study is to determine the ability of PSA methods to identify pairs of protein domains known to share differing levels of structural similarity, and to assess their utility for clustering domains from several different folds into known groups. We present the results of a comprehensive investigation into eighteen PSA methods, to our knowledge the largest piece of independent research on this topic. Overall, SP-AlignNS (non-sequential) was found to be the best method for classification, and among the best performing methods for clustering. Methods (where possible) were split into the algorithm used to find the optimal alignment and the score used to assess similarity. This allowed us to largely separate the algorithm from the score it maximizes and thus, to assess their effectiveness independently of each other. Surprisingly, we found that some hybrids of mismatched scores and algorithms performed better than either of the native methods at classification and, in some cases, clustering as well. It is hoped that this investigation and the accompanying discussion will be useful for researchers selecting or designing methods to align protein structures.
Collapse
|
2
|
Sam E, Athri P. Web-based drug repurposing tools: a survey. Brief Bioinform 2017; 20:299-316. [DOI: 10.1093/bib/bbx125] [Citation(s) in RCA: 30] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2017] [Indexed: 12/15/2022] Open
Affiliation(s)
- Elizabeth Sam
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| | - Prashanth Athri
- Department of Computer Science & Engineering Amrita, University Bengaluru, India
| |
Collapse
|
3
|
Abstract
Globular proteins typically fold into tightly packed arrays of regular secondary structures. We developed a model to approximate the compact parallel and antiparallel arrangement of α-helices and β-strands, enumerated all possible topologies formed by up to five secondary structural elements (SSEs), searched for their occurrence in spatial structures of proteins, and documented their frequencies of occurrence in the PDB. The enumeration model grows larger super-secondary structure patterns (SSPs) by combining pairs of smaller patterns, a process that approximates a potential path of protein fold evolution. The most prevalent SSPs are typically present in superfolds such as the Rossmann-like fold, the ferredoxin-like fold, and the Greek key motif, whereas the less frequent SSPs often possess uncommon structure features such as split β-sheets, left-handed connections, and crossing loops. This complete SSP enumeration model, for the first time, allows us to investigate which theoretically possible SSPs are not observed in available protein structures. All SSPs with up to four SSEs occurred in proteins. However, among the SSPs with five SSEs, approximately 20% (218) are absent from existing folds. Of these unobserved SSPs, 80% contain two or more uncommon structure features. To facilitate future efforts in protein structure classification, engineering, and design, we provide the resulting patterns and their frequency of occurrence in proteins at: http://prodata.swmed.edu/ssps/.
Collapse
|
4
|
Guyon F, Martz F, Vavrusa M, Bécot J, Rey J, Tufféry P. BCSearch: fast structural fragment mining over large collections of protein structures. Nucleic Acids Res 2015; 43:W378-82. [PMID: 25977292 PMCID: PMC4489267 DOI: 10.1093/nar/gkv492] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2015] [Accepted: 05/02/2015] [Indexed: 01/23/2023] Open
Abstract
Resources to mine the large amount of protein structures available today are necessary to better understand how amino acid variations are compatible with conformation preservation, to assist protein design, engineering and, further, the development of biologic therapeutic compounds. BCSearch is a versatile service to efficiently mine large collections of protein structures. It relies on a new approach based on a Binet-Cauchy kernel that is more discriminative than the widely used root mean square deviation criterion. It has statistics independent of size even for short fragments, and is fast. The systematic mining of large collections of structures such as the complete SCOPe protein structural classification or comprehensive subsets of the Protein Data Bank can be performed in few minutes. Based on this new score, we propose four innovative applications: BCFragSearch and BCMirrorSearch, respectively, search for fragments similar and anti-similar to a query and return information on the diversity of the sequences of the hits. BCLoopSearch identifies candidate fragments of fixed size matching the flanks of a gaped structure. BCSpecificitySearch analyzes a complete protein structure and returns information about sites having few similar fragments. BCSearch is available at http://bioserv.rpbs.univ-paris-diderot.fr/services/BCSearch.
Collapse
Affiliation(s)
- Frédéric Guyon
- Molécules Thérapeutiques in Silico, INSERM UMR-S 973, Université Paris Diderot, Sorbone Paris Cité, 75205 Paris Cedex 13, France
| | - François Martz
- Molécules Thérapeutiques in Silico, INSERM UMR-S 973, Université Paris Diderot, Sorbone Paris Cité, 75205 Paris Cedex 13, France
| | - Marek Vavrusa
- Molécules Thérapeutiques in Silico, INSERM UMR-S 973, Université Paris Diderot, Sorbone Paris Cité, 75205 Paris Cedex 13, France
| | - Jérôme Bécot
- Molécules Thérapeutiques in Silico, INSERM UMR-S 973, Université Paris Diderot, Sorbone Paris Cité, 75205 Paris Cedex 13, France
| | - Julien Rey
- Molécules Thérapeutiques in Silico, INSERM UMR-S 973, Université Paris Diderot, Sorbone Paris Cité, 75205 Paris Cedex 13, France
| | - Pierre Tufféry
- Molécules Thérapeutiques in Silico, INSERM UMR-S 973, Université Paris Diderot, Sorbone Paris Cité, 75205 Paris Cedex 13, France
| |
Collapse
|
5
|
Kasarapu P, de la Banda MG, Konagurthu AS. On Representing Protein Folding Patterns Using Non-Linear Parametric Curves. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:1218-1228. [PMID: 26357057 DOI: 10.1109/tcbb.2014.2338319] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Proteins fold into complex three-dimensional shapes. Simplified representations of their shapes are central to rationalise, compare, classify, and interpret protein structures. Traditional methods to abstract protein folding patterns rely on representing their standard secondary structural elements (helices and strands of sheet) using line segments. This results in ignoring a significant proportion of structural information. The motivation of this research is to derive mathematically rigorous and biologically meaningful abstractions of protein folding patterns that maximize the economy of structural description and minimize the loss of structural information. We report on a novel method to describe a protein as a non-overlapping set of parametric three dimensional curves of varying length and complexity. Our approach to this problem is supported by information theory and uses the statistical framework of minimum message length (MML) inference. We demonstrate the effectiveness of our non-linear abstraction to support efficient and effective comparison of protein folding patterns on a large scale.
Collapse
|
6
|
Metri R, Jerath G, Kailas G, Gacche N, Pal A, Ramakrishnan V. Structure-based barcoding of proteins. Protein Sci 2013; 23:117-20. [PMID: 24170674 DOI: 10.1002/pro.2392] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/15/2013] [Accepted: 10/21/2013] [Indexed: 11/09/2022]
Abstract
A reduced representation in the format of a barcode has been developed to provide an overview of the topological nature of a given protein structure from 3D coordinate file. The molecular structure of a protein coordinate file from Protein Data Bank is first expressed in terms of an alpha-numero code and further converted to a barcode image. The barcode representation can be used to compare and contrast different proteins based on their structure. The utility of this method has been exemplified by comparing structural barcodes of proteins that belong to same fold family, and across different folds. In addition to this, we have attempted to provide an illustration to (i) the structural changes often seen in a given protein molecule upon interaction with ligands and (ii) Modifications in overall topology of a given protein during evolution. The program is fully downloadable from the website http://www.iitg.ac.in/probar/.
Collapse
Affiliation(s)
- Rahul Metri
- Institute of Bioinformatics & Applied Biotechnology, Bangalore, 560100, India
| | | | | | | | | | | |
Collapse
|
7
|
Kinch LN, Shi S, Cheng H, Cong Q, Pei J, Mariani V, Schwede T, Grishin NV. CASP9 target classification. Proteins 2011; 79 Suppl 10:21-36. [PMID: 21997778 DOI: 10.1002/prot.23190] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2011] [Revised: 09/07/2011] [Accepted: 09/09/2011] [Indexed: 12/22/2022]
Abstract
The Critical assessment of protein structure prediction round 9 (CASP9) aimed to evaluate predictions for 129 experimentally determined protein structures. To assess tertiary structure predictions, these target structures were divided into domain-based evaluation units that were then classified into two assessment categories: template based modeling (TBM) and template free modeling (FM). CASP9 targets were split into domains of structurally compact evolutionary modules. For the targets with more than one defined domain, the decision to split structures into domains for evaluation was based on server performance. Target domains were categorized based on their evolutionary relatedness to existing templates as well as their difficulty levels indicated by server performance. Those target domains with sequence-related templates and high server prediction performance were classified as TMB, whereas those targets without identifiable templates and low server performance were classified as FM. However, using these generalizations for classification resulted in a blurred boundary between CASP9 assessment categories. Thus, the FM category included those domains without sequence detectable templates (25 target domains) as well as some domains with difficult to detect templates whose predictions were as poor as those without templates (five target domains). Several interesting examples are discussed, including targets with sequence related templates that exhibit unusual structural differences, targets with homologous or analogous structure templates that are not detectable by sequence, and targets with new folds.
Collapse
Affiliation(s)
- Lisa N Kinch
- Howard Hughes Medical Institute, University of Texas, Southwestern Medical Center, Dallas, TX 75390-9050, USA.
| | | | | | | | | | | | | | | |
Collapse
|
8
|
Gelly JC, Joseph AP, Srinivasan N, de Brevern AG. iPBA: a tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res 2011; 39:W18-23. [PMID: 21586582 PMCID: PMC3125758 DOI: 10.1093/nar/gkr333] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open
Abstract
With the immense growth in the number of available protein structures, fast and accurate structure comparison has been essential. We propose an efficient method for structure comparison, based on a structural alphabet. Protein Blocks (PBs) is a widely used structural alphabet with 16 pentapeptide conformations that can fairly approximate a complete protein chain. Thus a 3D structure can be translated into a 1D sequence of PBs. With a simple Needleman–Wunsch approach and a raw PB substitution matrix, PB-based structural alignments were better than many popular methods. iPBA web server presents an improved alignment approach using (i) specialized PB Substitution Matrices (SM) and (ii) anchor-based alignment methodology. With these developments, the quality of ∼88% of alignments was improved. iPBA alignments were also better than DALI, MUSTANG and GANGSTA+ in >80% of the cases. The webserver is designed to for both pairwise comparisons and database searches. Outputs are given as sequence alignment and superposed 3D structures displayed using PyMol and Jmol. A local alignment option for detecting subs-structural similarity is also embedded. As a fast and efficient ‘sequence-based’ structure comparison tool, we believe that it will be quite useful to the scientific community. iPBA can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/ipba/.
Collapse
Affiliation(s)
- Jean-Christophe Gelly
- INSERM, UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques, Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine, 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France
| | | | | | | |
Collapse
|
9
|
Stivala AD, Stuckey PJ, Wirth AI. Fast and accurate protein substructure searching with simulated annealing and GPUs. BMC Bioinformatics 2010; 11:446. [PMID: 20813068 PMCID: PMC2944279 DOI: 10.1186/1471-2105-11-446] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Accepted: 09/03/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Searching a database of protein structures for matches to a query structure, or occurrences of a structural motif, is an important task in structural biology and bioinformatics. While there are many existing methods for structural similarity searching, faster and more accurate approaches are still required, and few current methods are capable of substructure (motif) searching. RESULTS We developed an improved heuristic for tableau-based protein structure and substructure searching using simulated annealing, that is as fast or faster and comparable in accuracy, with some widely used existing methods. Furthermore, we created a parallel implementation on a modern graphics processing unit (GPU). CONCLUSIONS The GPU implementation achieves up to 34 times speedup over the CPU implementation of tableau-based structure search with simulated annealing, making it one of the fastest available methods. To the best of our knowledge, this is the first application of a GPU to the protein structural search problem.
Collapse
Affiliation(s)
- Alex D Stivala
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
| | - Peter J Stuckey
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
- National ICT Australia Victoria Laboratory at The University of Melbourne, Victoria 3010, Australia
| | - Anthony I Wirth
- Department of Computer Science and Software Engineering, The University of Melbourne, Victoria 3010, Australia
| |
Collapse
|
10
|
Zhang ZH, Bharatham K, Sherman WA, Mihalek I. deconSTRUCT: general purpose protein database search on the substructure level. Nucleic Acids Res 2010; 38:W590-4. [PMID: 20522512 PMCID: PMC2896154 DOI: 10.1093/nar/gkq489] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
deconSTRUCT webserver offers an interface to a protein database search engine, usable for a general purpose detection of similar protein (sub)structures. Initially, it deconstructs the query structure into its secondary structure elements (SSEs) and reassembles the match to the target by requiring a (tunable) degree of similarity in the direction and sequential order of SSEs. Hierarchical organization and judicious use of the information about protein structure enables deconSTRUCT to achieve the sensitivity and specificity of the established search engines at orders of magnitude increased speed, without tying up irretrievably the substructure information in the form of a hash. In a post-processing step, a match on the level of the backbone atoms is constructed. The results presented to the user consist of the list of the matched SSEs, the transformation matrix for rigid superposition of the structures and several ways of visualization, both downloadable and implemented as a web-browser plug-in. The server is available at http://epsf.bmad.bii.a-star.edu.sg/struct_server.html.
Collapse
Affiliation(s)
- Zong Hong Zhang
- Bioinformatics Institute 30 Biopolis Street, #07-01 Matrix, Singapore
| | | | | | | |
Collapse
|