1
|
|
2
|
Aleksandrova AA, Sarti E, Forrest LR. MemSTATS: A Benchmark Set of Membrane Protein Symmetries and Pseudosymmetries. J Mol Biol 2019; 432:597-604. [PMID: 31628944 DOI: 10.1016/j.jmb.2019.09.020] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/30/2019] [Accepted: 09/23/2019] [Indexed: 02/06/2023]
Abstract
In membrane proteins, symmetry and pseudosymmetry often have functional or evolutionary implications. However, available symmetry detection methods have not been tested systematically on this class of proteins because of the lack of an appropriate benchmark set. Here we present MemSTATS, a publicly available benchmark set of both quaternary- and internal-symmetries in membrane protein structures. The symmetries are described in terms of order, repeated elements, and orientation of the axis with respect to the membrane plane. Moreover, using MemSTATS, we compare the performance of four widely used symmetry detection algorithms and highlight specific challenges and areas for improvement in the future.
Collapse
Affiliation(s)
- Antoniya A Aleksandrova
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucy R Forrest
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
3
|
Bliven SE, Lafita A, Rose PW, Capitani G, Prlić A, Bourne PE. Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. PLoS Comput Biol 2019; 15:e1006842. [PMID: 31009453 PMCID: PMC6504099 DOI: 10.1371/journal.pcbi.1006842] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 05/07/2019] [Accepted: 01/29/2019] [Indexed: 01/04/2023] Open
Abstract
Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution. Many protein structures show a great deal of regularity. Even within single polypeptide chains, about 25% of proteins contain self-similar repeating structures, which can be organized in ring-like symmetric arrangements or linear open repeats. The repeats are often related, and thus comparing the sequence and structure of repeats can give an idea as to the early evolutionary history of a protein family. Additionally, the conservation and divergence of repeats can lead to insights about the function of the proteins. This work describes CE-Symm 2.0, a tool for the analysis of protein symmetry. The method automatically detects internal symmetry in protein structures and produces a multiple alignment of structural repeats. The algorithm is able to detect the geometric relationships between the repeats, including cyclic, dihedral, and polyhedral symmetries, translational repeats, and cases where multiple symmetry operators are applicable in a hierarchical manner. These complex relationships can then be visualized in a graphical interface as a complete structure, as a superposition of repeats, or as a multiple alignment of the protein sequence. CE-Symm 2.0 can be systematically used for the automatic detection of internal symmetry in protein structures, or as an interactive tool for the analysis of structural repeats.
Collapse
Affiliation(s)
- Spencer E. Bliven
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Institute of Applied Simulation, Zurich University of Applied Science, Wädenswil, Switzerland
- * E-mail: (SEB), (AL)
| | - Aleix Lafita
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (SEB), (AL)
| | - Peter W. Rose
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
- Structural Bioinformatics Laboratory, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Guido Capitani
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Prlić
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Philip E. Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
4
|
Chakrabarty B, Parekh N. PRIGSA: protein repeat identification by graph spectral analysis. J Bioinform Comput Biol 2015; 12:1442009. [PMID: 25385083 DOI: 10.1142/s0219720014420098] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Repetition of a structural motif within protein is associated with a wide range of structural and functional roles. In most cases the repeating units are well conserved at the structural level while at the sequence level, they are mostly undetectable suggesting the need for structure-based methods. Since most known methods require a training dataset, de novo approach is desirable. Here, we propose an efficient graph-based approach for detecting structural repeats in proteins. In a protein structure represented as a graph, interactions between inter- and intra-repeat units are well captured by the eigen spectra of adjacency matrix of the graph. These conserved interactions give rise to similar connections and a unique profile of the principal eigen spectra for each repeating unit. The efficacy of the approach is shown on eight repeat families annotated in UniProt, comprising of both solenoid and nonsolenoid repeats with varied secondary structure architecture and repeat lengths. The performance of the approach is also tested on other known benchmark datasets and the performance compared with two repeat identification methods. For a known repeat type, the algorithm also identifies the type of repeat present in the protein. A web tool implementing the algorithm is available at the URL http://bioinf.iiit.ac.in/PRIGSA/.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India
| | | |
Collapse
|
5
|
Do Viet P, Roche DB, Kajava AV. TAPO: A combined method for the identification of tandem repeats in protein structures. FEBS Lett 2015; 589:2611-9. [PMID: 26320412 DOI: 10.1016/j.febslet.2015.08.025] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2015] [Revised: 08/10/2015] [Accepted: 08/13/2015] [Indexed: 10/23/2022]
Abstract
In recent years, there has been an emergence of new 3D structures of proteins containing tandem repeats (TRs), as a result of improved expression and crystallization strategies. Databases focused on structure classifications (PDB, SCOP, CATH) do not provide an easy solution for selection of these structures from PDB. Several approaches have been developed, but no best approach exists to identify the whole range of 3D TRs. Here we describe the TAndem PrOtein detector (TAPO) that uses periodicities of atomic coordinates and other types of structural representation, including strings generated by conformational alphabets, residue contact maps, and arrangements of vectors of secondary structure elements. The benchmarking shows the superior performance of TAPO over the existing programs. In accordance with our analysis of PDB using TAPO, 19% of proteins contain 3D TRs. This analysis allowed us to identify new families of 3D TRs, suggesting that TAPO can be used to regularly update the collection and classification of existing repetitive structures.
Collapse
Affiliation(s)
- Phuong Do Viet
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Daniel B Roche
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France
| | - Andrey V Kajava
- Centre de Recherche de Biochimie Macromoléculaire, UMR 5237 CNRS, Université Montpellier, 1919, Route de Mende, 34293 Montpellier Cedex 5, France; Institut de Biologie Computationnelle, Université Montpellier, Bat. 5, 860, rue St Priest, 34095 Montpellier Cedex 5, France.
| |
Collapse
|
6
|
Chakrabarty B, Parekh N. Identifying tandem Ankyrin repeats in protein structures. BMC Bioinformatics 2014; 15:6599. [PMID: 25547411 PMCID: PMC4307672 DOI: 10.1186/s12859-014-0440-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/18/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat. RESULTS It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'. CONCLUSIONS AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| |
Collapse
|
7
|
Myers-Turnbull D, Bliven SE, Rose PW, Aziz ZK, Youkharibache P, Bourne PE, Prlić A. Systematic detection of internal symmetry in proteins using CE-Symm. J Mol Biol 2014; 426:2255-68. [PMID: 24681267 DOI: 10.1016/j.jmb.2014.03.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 11/26/2022]
Abstract
Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. This process maintains structural similarity and is further supported by this study. To further investigate the question of how internal symmetry evolved, how symmetry and function are related, and the overall frequency of internal symmetry, we developed an algorithm, CE-Symm, to detect pseudo-symmetry within the tertiary structure of protein chains. Using a large manually curated benchmark of 1007 protein domains, we show that CE-Symm performs significantly better than previous approaches. We use CE-Symm to build a census of symmetry among domain superfamilies in SCOP and note that 18% of all superfamilies are pseudo-symmetric. Our results indicate that more domains are pseudo-symmetric than previously estimated. We establish a number of recurring types of symmetry-function relationships and describe several characteristic cases in detail. With the use of the Enzyme Commission classification, symmetry was found to be enriched in some enzyme classes but depleted in others. CE-Symm thus provides a methodology for a more complete and detailed study of the role of symmetry in tertiary protein structure [availability: CE-Symm can be run from the Web at http://source.rcsb.org/jfatcatserver/symmetry.jsp. Source code and software binaries are also available under the GNU Lesser General Public License (version 2.1) at https://github.com/rcsb/symmetry. An interactive census of domains identified as symmetric by CE-Symm is available from http://source.rcsb.org/jfatcatserver/scopResults.jsp].
Collapse
Affiliation(s)
- Douglas Myers-Turnbull
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Spencer E Bliven
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Peter W Rose
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Zaid K Aziz
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Philip E Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
8
|
Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 2013; 14:24. [PMID: 23331634 PMCID: PMC3637537 DOI: 10.1186/1471-2105-14-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/08/2013] [Indexed: 11/10/2022] Open
Abstract
Background Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. Results We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. Conclusions MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Computational Science and Engineering, Nagoya University, Nagoya 464-8603, Japan
| | | | | |
Collapse
|
9
|
Detecting internally symmetric protein structures. BMC Bioinformatics 2010; 11:303. [PMID: 20525292 PMCID: PMC2894822 DOI: 10.1186/1471-2105-11-303] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Accepted: 06/03/2010] [Indexed: 11/30/2022] Open
Abstract
Background Many functional proteins have a symmetric structure. Most of these are multimeric complexes, which are made of non-symmetric monomers arranged in a symmetric manner. However, there are also a large number of proteins that have a symmetric structure in the monomeric state. These internally symmetric proteins are interesting objects from the point of view of their folding, function, and evolution. Most algorithms that detect the internally symmetric proteins depend on finding repeating units of similar structure and do not use the symmetry information. Results We describe a new method, called SymD, for detecting symmetric protein structures. The SymD procedure works by comparing the structure to its own copy after the copy is circularly permuted by all possible number of residues. The procedure is relatively insensitive to symmetry-breaking insertions and deletions and amplifies positive signals from symmetry. It finds 70% to 80% of the TIM barrel fold domains in the ASTRAL 40 domain database and 100% of the beta-propellers as symmetric. More globally, 10% to 15% of the proteins in the ASTRAL 40 domain database may be considered symmetric according to this procedure depending on the precise cutoff value used to measure the degree of perfection of the symmetry. Symmetrical proteins occur in all structural classes and can have a closed, circular structure, a cylindrical barrel-like structure, or an open, helical structure. Conclusions SymD is a sensitive procedure for detecting internally symmetric protein structures. Using this procedure, we estimate that 10% to 15% of the known protein domains may be considered symmetric. We also report an initial, overall view of the types of symmetries and symmetric folds that occur in the protein domain structure universe.
Collapse
|
10
|
Abstract
BACKGROUND Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy. RESULTS We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy. CONCLUSION We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases.
Collapse
|
11
|
Schmidt-Goenner T, Guerler A, Kolbeck B, Knapp EW. Circular permuted proteins in the universe of protein folds. Proteins 2009; 78:1618-30. [DOI: 10.1002/prot.22678] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
12
|
Chu CH, Tang CY, Tang CY, Pai TW. Angle-distance image matching techniques for protein structure comparison. J Mol Recognit 2008; 21:442-52. [DOI: 10.1002/jmr.914] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
13
|
Role of electrostatics on membrane binding, aggregation and destabilization induced by NAD(P)H dehydrogenases. Implication in membrane fusion. Biophys Chem 2008; 137:126-32. [DOI: 10.1016/j.bpc.2008.08.003] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/12/2008] [Revised: 08/08/2008] [Accepted: 08/08/2008] [Indexed: 11/17/2022]
|
14
|
Abyzov A, Ilyin VA. A comprehensive analysis of non-sequential alignments between all protein structures. BMC STRUCTURAL BIOLOGY 2007; 7:78. [PMID: 18005453 PMCID: PMC2213659 DOI: 10.1186/1472-6807-7-78] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Accepted: 11/16/2007] [Indexed: 05/02/2023]
Abstract
Background The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank. Results The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe. Conclusion The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.
Collapse
Affiliation(s)
- Alexej Abyzov
- Department of Biology, Northeastern University 360 Huntington Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|