1
|
Aleksandrova AA, Sarti E, Forrest LR. EncoMPASS: An encyclopedia of membrane proteins analyzed by structure and symmetry. Structure 2024; 32:492-504.e4. [PMID: 38367624 DOI: 10.1016/j.str.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2018] [Revised: 01/09/2024] [Accepted: 01/10/2024] [Indexed: 02/19/2024]
Abstract
Protein structure determination and prediction, active site detection, and protein sequence alignment techniques all exploit information about protein structure and structural relationships. For membrane proteins, however, there is limited agreement among available online tools for highlighting and mapping such structural similarities. Moreover, no available resource provides a systematic overview of quaternary and internal symmetries, and their orientation relative to the membrane, despite the fact that these properties can provide key insights into membrane protein function and evolution. Here, we describe the Encyclopedia of Membrane Proteins Analyzed by Structure and Symmetry (EncoMPASS), a database for relating integral membrane proteins of known structure from the points of view of sequence, structure, and symmetry. EncoMPASS is accessible through a web interface, and its contents can be easily downloaded. This allows the user not only to focus on specific proteins, but also to study general properties of the structure and evolution of membrane proteins.
Collapse
Affiliation(s)
- Antoniya A Aleksandrova
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA
| | - Lucy R Forrest
- Computational Structural Biology Section, National Institutes of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD 20892, USA.
| |
Collapse
|
2
|
Fallaize CJ, Green PJ, Mardia KV, Barber S. Bayesian protein sequence and structure alignment. J R Stat Soc Ser C Appl Stat 2020. [DOI: 10.1111/rssc.12394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Affiliation(s)
| | - Peter J. Green
- University of Bristol UK
- University of Technology Sydney Australia
| | | | | |
Collapse
|
3
|
Pagès G, Grudinin S. DeepSymmetry: using 3D convolutional networks for identification of tandem repeats and internal symmetries in protein structures. Bioinformatics 2019; 35:5113-5120. [PMID: 31161198 DOI: 10.1093/bioinformatics/btz454] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2018] [Revised: 04/16/2019] [Accepted: 05/29/2019] [Indexed: 01/31/2023] Open
Abstract
MOTIVATION Thanks to the recent advances in structural biology, nowadays 3D structures of various proteins are solved on a routine basis. A large portion of these structures contain structural repetitions or internal symmetries. To understand the evolution mechanisms of these proteins and how structural repetitions affect the protein function, we need to be able to detect such proteins very robustly. As deep learning is particularly suited to deal with spatially organized data, we applied it to the detection of proteins with structural repetitions. RESULTS We present DeepSymmetry, a versatile method based on 3D convolutional networks that detects structural repetitions in proteins and their density maps. Our method is designed to identify tandem repeat proteins, proteins with internal symmetries, symmetries in the raw density maps, their symmetry order and also the corresponding symmetry axes. Detection of symmetry axes is based on learning 6D Veronese mappings of 3D vectors, and the median angular error of axis determination is less than one degree. We demonstrate the capabilities of our method on benchmarks with tandem-repeated proteins and also with symmetrical assemblies. For example, we have discovered about 7800 putative tandem repeat proteins in the PDB. AVAILABILITY AND IMPLEMENTATION The method is available at https://team.inria.fr/nano-d/software/deepsymmetry. It consists of a C++ executable that transforms molecular structures into volumetric density maps, and a Python code based on the TensorFlow framework for applying the DeepSymmetry model to these maps. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guillaume Pagès
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| | - Sergei Grudinin
- Inria, Université Grenoble Alpes, CNRS, Grenoble INP, LJK, 38000 Grenoble, France
| |
Collapse
|
4
|
Aleksandrova AA, Sarti E, Forrest LR. MemSTATS: A Benchmark Set of Membrane Protein Symmetries and Pseudosymmetries. J Mol Biol 2019; 432:597-604. [PMID: 31628944 DOI: 10.1016/j.jmb.2019.09.020] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/29/2019] [Revised: 08/30/2019] [Accepted: 09/23/2019] [Indexed: 02/06/2023]
Abstract
In membrane proteins, symmetry and pseudosymmetry often have functional or evolutionary implications. However, available symmetry detection methods have not been tested systematically on this class of proteins because of the lack of an appropriate benchmark set. Here we present MemSTATS, a publicly available benchmark set of both quaternary- and internal-symmetries in membrane protein structures. The symmetries are described in terms of order, repeated elements, and orientation of the axis with respect to the membrane plane. Moreover, using MemSTATS, we compare the performance of four widely used symmetry detection algorithms and highlight specific challenges and areas for improvement in the future.
Collapse
Affiliation(s)
- Antoniya A Aleksandrova
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Edoardo Sarti
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA
| | - Lucy R Forrest
- Computational Structural Biology Section, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, 20892, USA.
| |
Collapse
|
5
|
Pagès G, Kinzina E, Grudinin S. Analytical symmetry detection in protein assemblies. I. Cyclic symmetries. J Struct Biol 2018; 203:142-148. [PMID: 29705493 DOI: 10.1016/j.jsb.2018.04.004] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2018] [Revised: 04/18/2018] [Accepted: 04/19/2018] [Indexed: 12/30/2022]
Abstract
Symmetry in protein, and, more generally, in macromolecular assemblies is a key point to understand their structure, stability and function. Many symmetrical assemblies are currently present in the Protein Data Bank (PDB) and some of them are among the largest solved structures, thus an efficient computational method is needed for the exhaustive analysis of these. The cyclic symmetry groups represent the most common assemblies in the PDB. These are also the building blocks for higher-order symmetries. This paper presents a mathematical formulation to find the position and the orientation of the symmetry axis in a cyclic symmetrical protein assembly, and also to assess the quality of this symmetry. Our method can also detect symmetries in partial assemblies. We provide an efficient C++ implementation of the method and demonstrate its efficiency on several examples including partial assemblies and pseudo symmetries. We also compare the method with two other published techniques and show that it is significantly faster on all the tested examples. Our method produces results with a machine precision, its cost function is solely based on 3D Euclidean geometry, and most of the operations are performed analytically. The method is available athttp://team.inria.fr/nano-d/software/ananas. The graphical user interface of the method built for the SAMSON platform is available athttp://samson-connect.net.
Collapse
Affiliation(s)
- Guillaume Pagès
- Inria, Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble 38000, France
| | - Elvira Kinzina
- Moscow Institute of Physics and Technology, Dolgoprudny 141701, Russia
| | - Sergei Grudinin
- Inria, Univ. Grenoble Alpes, CNRS, Grenoble INP, LJK, Grenoble 38000, France.
| |
Collapse
|
6
|
Parallel-SymD: A Parallel Approach to Detect Internal Symmetry in Protein Domains. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4628592. [PMID: 27747230 PMCID: PMC5056246 DOI: 10.1155/2016/4628592] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/24/2016] [Accepted: 08/25/2016] [Indexed: 11/24/2022]
Abstract
Internally symmetric proteins are proteins that have a symmetrical structure in their monomeric single-chain form. Around 10–15% of the protein domains can be regarded as having some sort of internal symmetry. In this regard, we previously published SymD (symmetry detection), an algorithm that determines whether a given protein structure has internal symmetry by attempting to align the protein to its own copy after the copy is circularly permuted by all possible numbers of residues. SymD has proven to be a useful algorithm to detect symmetry. In this paper, we present a new parallelized algorithm called Parallel-SymD for detecting symmetry of proteins on clusters of computers. The achieved speedup of the new Parallel-SymD algorithm scales well with the number of computing processors. Scaling is better for proteins with a larger number of residues. For a protein of 509 residues, a speedup of 63 was achieved on a parallel system with 100 processors.
Collapse
|
7
|
Abstract
Structural domains are believed to be modules within proteins that can fold and function independently. Some proteins show tandem repetitions of apparent modular structure that do not fold independently, but rather co-operate in stabilizing structural forms that comprise several repeat-units. For many natural repeat-proteins, it has been shown that weak energetic links between repeats lead to the breakdown of co-operativity and the appearance of folding sub-domains within an apparently regular repeat array. The quasi-1D architecture of repeat-proteins is crucial in detailing how the local energetic balances can modulate the folding dynamics of these proteins, which can be related to the physiological behaviour of these ubiquitous biological systems.
Collapse
|
8
|
Pellegrini M. Tandem Repeats in Proteins: Prediction Algorithms and Biological Role. Front Bioeng Biotechnol 2015; 3:143. [PMID: 26442257 PMCID: PMC4585158 DOI: 10.3389/fbioe.2015.00143] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2015] [Accepted: 09/07/2015] [Indexed: 12/30/2022] Open
Abstract
Tandem repetitions in protein sequence and structure is a fascinating subject of research which has been a focus of study since the late 1990s. In this survey, we give an overview on the multi-faceted aspects of research on protein tandem repeats (PTR for short), including prediction algorithms, databases, early classification efforts, mechanisms of PTR formation and evolution, and synthetic PTR design. We also touch on the rather open issue of the relationship between PTR and flexibility (or disorder) in proteins. Detection of PTR either from protein sequence or structure data is challenging due to inherent high (biological) signal-to-noise ratio that is a key feature of this problem. As early in silico analytic tools have been key enablers for starting this field of study, we expect that current and future algorithmic and statistical breakthroughs will have a high impact on the investigations of the biological role of PTR.
Collapse
Affiliation(s)
- Marco Pellegrini
- Laboratory for Integrative Systems Medicine (LISM), Istituto di Informatica e Telematica, and Istituto di Fisiologia Clinica, Consiglio Nazionale delle Ricerche , Pisa , Italy
| |
Collapse
|
9
|
Chakrabarty B, Parekh N. Identifying tandem Ankyrin repeats in protein structures. BMC Bioinformatics 2014; 15:6599. [PMID: 25547411 PMCID: PMC4307672 DOI: 10.1186/s12859-014-0440-9] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2014] [Accepted: 12/18/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Tandem repetition of structural motifs in proteins is frequently observed across all forms of life. Topology of repeating unit and its frequency of occurrence are associated to a wide range of structural and functional roles in diverse proteins, and defects in repeat proteins have been associated with a number of diseases. It is thus desirable to accurately identify specific repeat type and its copy number. Weak evolutionary constraints on repeat units and insertions/deletions between them make their identification difficult at the sequence level and structure based approaches are desired. The proposed graph spectral approach is based on protein structure represented as a graph for detecting one of the most frequently observed structural repeats, Ankyrin repeat. RESULTS It has been shown in a large number of studies that 3-dimensional topology of a protein structure is well captured by a graph, making it possible to analyze a complex protein structure as a mathematical entity. In this study we show that eigen spectra profile of a protein structure graph exhibits a unique repetitive profile for contiguous repeating units enabling the detection of the repeat region and the repeat type. The proposed approach uses a non-redundant set of 58 Ankyrin proteins to define rules for the detection of Ankyrin repeat motifs. It is evaluated on a set of 370 proteins comprising 125 known Ankyrin proteins and remaining non-solenoid proteins and the prediction compared with UniProt annotation, sequence-based approach, RADAR, and structure-based approach, ConSole. To show the efficacy of the approach, we analyzed the complete PDB structural database and identified 641 previously unrecognized Ankyrin repeat proteins. We observe a unique eigen spectra profile for different repeat types and show that the method can be easily extended to detect other repeat types. It is implemented as a web server, AnkPred. It is freely available at 'bioinf.iiit.ac.in/AnkPred'. CONCLUSIONS AnkPred provides an elegant and computationally efficient graph-based approach for detecting Ankyrin structural repeats in proteins. By analyzing the eigen spectra of the protein structure graph and secondary structure information, characteristic features of a known repeat family are identified. This method is especially useful in correctly identifying new members of a repeat family.
Collapse
Affiliation(s)
- Broto Chakrabarty
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| | - Nita Parekh
- Centre for Computational Natural Sciences and Bioinformatics, International Institute of Information Technology, Hyderabad, India.
| |
Collapse
|
10
|
Tai CH, Paul R, Dukka KC, Shilling JD, Lee B. SymD webserver: a platform for detecting internally symmetric protein structures. Nucleic Acids Res 2014; 42:W296-300. [PMID: 24799435 PMCID: PMC4086132 DOI: 10.1093/nar/gku364] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/15/2023] Open
Abstract
Internal symmetry of a protein structure is the pseudo-symmetry that a single protein chain sometimes exhibits. This is in contrast to the symmetry with which monomers are arranged in many multimeric protein complexes. SymD is a program that detects proteins with internal symmetry. It proved to be useful for analyzing protein structure, function and modeling. This web-based interactive tool was developed by implementing the SymD algorithm. To the best of our knowledge, SymD webserver is the first tool of its kind with which users can easily study the symmetry of the protein they are interested in by uploading the structure or retrieving it from databases. It uses the Galaxy platform to take advantage of its extensibility and displays the symmetry properties, the symmetry axis and the sequence alignment of the structures before and after the symmetry transformation via an interactive graphical visualization environment in any modern web browser. An Example Run video displays the workflow to help users navigate. SymD webserver is publicly available at http://symd.nci.nih.gov.
Collapse
Affiliation(s)
- Chin-Hsien Tai
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Rohit Paul
- Office of Information Technology, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA
| | | | - Jeffery D Shilling
- Office of Information Technology, National Cancer Institute, National Institutes of Health, Rockville, MD 20850, USA
| | - Byungkook Lee
- Laboratory of Molecular Biology, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| |
Collapse
|
11
|
Myers-Turnbull D, Bliven SE, Rose PW, Aziz ZK, Youkharibache P, Bourne PE, Prlić A. Systematic detection of internal symmetry in proteins using CE-Symm. J Mol Biol 2014; 426:2255-68. [PMID: 24681267 DOI: 10.1016/j.jmb.2014.03.010] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 11/26/2022]
Abstract
Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. This process maintains structural similarity and is further supported by this study. To further investigate the question of how internal symmetry evolved, how symmetry and function are related, and the overall frequency of internal symmetry, we developed an algorithm, CE-Symm, to detect pseudo-symmetry within the tertiary structure of protein chains. Using a large manually curated benchmark of 1007 protein domains, we show that CE-Symm performs significantly better than previous approaches. We use CE-Symm to build a census of symmetry among domain superfamilies in SCOP and note that 18% of all superfamilies are pseudo-symmetric. Our results indicate that more domains are pseudo-symmetric than previously estimated. We establish a number of recurring types of symmetry-function relationships and describe several characteristic cases in detail. With the use of the Enzyme Commission classification, symmetry was found to be enriched in some enzyme classes but depleted in others. CE-Symm thus provides a methodology for a more complete and detailed study of the role of symmetry in tertiary protein structure [availability: CE-Symm can be run from the Web at http://source.rcsb.org/jfatcatserver/symmetry.jsp. Source code and software binaries are also available under the GNU Lesser General Public License (version 2.1) at https://github.com/rcsb/symmetry. An interactive census of domains identified as symmetric by CE-Symm is available from http://source.rcsb.org/jfatcatserver/scopResults.jsp].
Collapse
Affiliation(s)
- Douglas Myers-Turnbull
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Spencer E Bliven
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Peter W Rose
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Zaid K Aziz
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Philip E Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
12
|
Rueda M, Orozco M, Totrov M, Abagyan R. BioSuper: a web tool for the superimposition of biomolecules and assemblies with rotational symmetry. BMC STRUCTURAL BIOLOGY 2013; 13:32. [PMID: 24330655 PMCID: PMC3924234 DOI: 10.1186/1472-6807-13-32] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/21/2013] [Accepted: 12/03/2013] [Indexed: 12/02/2022]
Abstract
Background Most of the proteins in the Protein Data Bank (PDB) are oligomeric complexes consisting of two or more subunits that associate by rotational or helical symmetries. Despite the myriad of superimposition tools in the literature, we could not find any able to account for rotational symmetry and display the graphical results in the web browser. Results BioSuper is a free web server that superimposes and calculates the root mean square deviation (RMSD) of protein complexes displaying rotational symmetry. To the best of our knowledge, BioSuper is the first tool of its kind that provides immediate interactive visualization of the graphical results in the browser, biomolecule generator capabilities, different levels of atom selection, sequence-dependent and structure-based superimposition types, and is the only web tool that takes into account the equivalence of atoms in side chains displaying symmetry ambiguity. BioSuper uses ICM program functionality as a core for the superimpositions and displays the results as text, HTML tables and 3D interactive molecular objects that can be visualized in the browser or in Android and iOS platforms with a free plugin. Conclusions BioSuper is a fast and functional tool that allows for pairwise superimposition of proteins and assemblies displaying rotational symmetry. The web server was created after our own frustration when attempting to superimpose flexible oligomers. We strongly believe that its user-friendly and functional design will be of great interest for structural and computational biologists who need to superimpose oligomeric proteins (or any protein). BioSuper web server is freely available to all users at http://ablab.ucsd.edu/BioSuper.
Collapse
Affiliation(s)
| | | | | | - Ruben Abagyan
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093, USA.
| |
Collapse
|
13
|
Parra RG, Espada R, Sánchez IE, Sippl MJ, Ferreiro DU. Detecting repetitions and periodicities in proteins by tiling the structural space. J Phys Chem B 2013; 117:12887-97. [PMID: 23758291 PMCID: PMC3807821 DOI: 10.1021/jp402105j] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
![]()
The
notion of energy landscapes provides conceptual tools for understanding
the complexities of protein folding and function. Energy landscape
theory indicates that it is much easier to find sequences that satisfy
the “Principle of Minimal Frustration” when the folded
structure is symmetric (Wolynes, P. G. Symmetry and the Energy Landscapes
of Biomolecules. Proc. Natl. Acad. Sci. U.S.A.1996, 93, 14249–14255). Similarly,
repeats and structural mosaics may be fundamentally related to landscapes
with multiple embedded funnels. Here we present analytical tools to
detect and compare structural repetitions in protein molecules. By
an exhaustive analysis of the distribution of structural repeats using
a robust metric, we define those portions of a protein molecule that
best describe the overall structure as a tessellation of basic units.
The patterns produced by such tessellations provide intuitive representations
of the repeating regions and their association toward higher order
arrangements. We find that some protein architectures can be described
as nearly periodic, while in others clear separations between repetitions
exist. Since the method is independent of amino acid sequence information,
we can identify structural units that can be encoded by a variety
of distinct amino acid sequences.
Collapse
Affiliation(s)
- R Gonzalo Parra
- Protein Physiology Lab, Departamento de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN , Buenos Aires, Argentina
| | | | | | | | | |
Collapse
|
14
|
Walsh I, Sirocco FG, Minervini G, Di Domenico T, Ferrari C, Tosatto SCE. RAPHAEL: recognition, periodicity and insertion assignment of solenoid protein structures. ACTA ACUST UNITED AC 2012; 28:3257-64. [PMID: 22962341 DOI: 10.1093/bioinformatics/bts550] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Repeat proteins form a distinct class of structures where folding is greatly simplified. Several classes have been defined, with solenoid repeats of periodicity between ca. 5 and 40 being the most challenging to detect. Such proteins evolve quickly and their periodicity may be rapidly hidden at sequence level. From a structural point of view, finding solenoids may be complicated by the presence of insertions or multiple domains. To the best of our knowledge, no automated methods are available to characterize solenoid repeats from structure. RESULTS Here we introduce RAPHAEL, a novel method for the detection of solenoids in protein structures. It reliably solves three problems of increasing difficulty: (1) recognition of solenoid domains, (2) determination of their periodicity and (3) assignment of insertions. RAPHAEL uses a geometric approach mimicking manual classification, producing several numeric parameters that are optimized for maximum performance. The resulting method is very accurate, with 89.5% of solenoid proteins and 97.2% of non-solenoid proteins correctly classified. RAPHAEL periodicities have a Spearman correlation coefficient of 0.877 against the manually established ones. A baseline algorithm for insertion detection in identified solenoids has a Q(2) value of 79.8%, suggesting room for further improvement. RAPHAEL finds 1931 highly confident repeat structures not previously annotated as solenoids in the Protein Data Bank records.
Collapse
Affiliation(s)
- Ian Walsh
- Department of Biology, University of Padua, Viale G. Colombo 3, 35131 Padova, Italy
| | | | | | | | | | | |
Collapse
|
15
|
Abstract
Motivation: Structural alignment methods are widely used to generate gold standard alignments for improving multiple sequence alignments and transferring functional annotations, as well as for assigning structural distances between proteins. However, the correctness of the alignments generated by these methods is difficult to assess objectively since little is known about the exact evolutionary history of most proteins. Since homology is an equivalence relation, an upper bound on alignment quality can be found by assessing the consistency of alignments. Measuring the consistency of current methods of structure alignment and determining the causes of inconsistencies can, therefore, provide information on the quality of current methods and suggest possibilities for further improvement. Results: We analyze the self-consistency of seven widely-used structural alignment methods (SAP, TM-align, Fr-TM-align, MAMMOTH, DALI, CE and FATCAT) on a diverse, non-redundant set of 1863 domains from the SCOP database and demonstrate that even for relatively similar proteins the degree of inconsistency of the alignments on a residue level is high (30%). We further show that levels of consistency vary substantially between methods, with two methods (SAP and Fr-TM-align) producing more consistent alignments than the rest. Inconsistency is found to be higher near gaps and for proteins of low structural complexity, as well as for helices. The ability of the methods to identify good structural alignments is also assessed using geometric measures, for which FATCAT (flexible mode) is found to be the best performer despite being highly inconsistent. We conclude that there is substantial scope for improving the consistency of structural alignment methods. Contact:msadows@nimr.mrc.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- M I Sadowski
- Division of Mathematical Biology, MRC National Institute for Medical Research, The Ridgeway, Mill Hill, London, UK
| | | |
Collapse
|
16
|
Shen X. Conformation and sequence evidence for two-fold symmetry in left-handed beta-helix fold. J Theor Biol 2011; 285:77-83. [PMID: 21708176 DOI: 10.1016/j.jtbi.2011.06.011] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2010] [Revised: 05/13/2011] [Accepted: 06/11/2011] [Indexed: 11/28/2022]
Abstract
The left-handed beta-helix (LβH) has received interest recently as it folds as a possible solution for the structure of misfolded proteins associated with prion and Huntington's diseases. Through a combination of sequence and structure analysis, we uncover a novel feature that is common to this unique fold: a two-fold symmetry in both sequence and structure, and this feature always coupled with extended loops in the middle of the helix. Since the results reveal a two-fold symmetric pattern both in the sequence and structure, it may indicate that the symmetry in tertiary structure is coded by the symmetry in primary sequence, which agrees with Anfisen's proposal that a protein's amino-acid sequence specify its three-dimensional structure. It may also indicate that LβH adopts a two-fold repeat pattern during the evolution process and symmetry helps maintaining the stability of the helix structure. The two-fold symmetric pattern and extended loops might be important in maintaining stability of helix proteins. This discovery can be useful in understanding the folding mechanisms of this protein fold and provide insights in the relation between sequences and structures.
Collapse
Affiliation(s)
- Xiaojuan Shen
- Neural Engineering Center, Institute of Biomedical and Health Engineering, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Avenue, Shenzhen University Town, Nanshan District, Shenzhen 518055, China.
| |
Collapse
|
17
|
Kajava AV. Tandem repeats in proteins: from sequence to structure. J Struct Biol 2011; 179:279-88. [PMID: 21884799 DOI: 10.1016/j.jsb.2011.08.009] [Citation(s) in RCA: 152] [Impact Index Per Article: 11.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Revised: 08/15/2011] [Accepted: 08/17/2011] [Indexed: 10/17/2022]
Abstract
The bioinformatics analysis of proteins containing tandem repeats requires special computer programs and databases, since the conventional approaches predominantly developed for globular domains have limited success. Here, I survey bioinformatics tools which have been developed recently for identification and proteome-wide analysis of protein repeats. The last few years have also been marked by an emergence of new 3D structures of these proteins. Appraisal of the known structures and their classification uncovers a straightforward relationship between their architecture and the length of the repetitive units. This relationship and the repetitive character of structural folds suggest rules for better prediction of the 3D structures of such proteins. Furthermore, bioinformatics approaches combined with low resolution structural data, from biophysical techniques, especially, the recently emerged cryo-electron microscopy, lead to reliable prediction of the protein repeat structures and their mode of binding with partners within molecular complexes. This hybrid approach can actively be used for structural and functional annotations of proteomes.
Collapse
Affiliation(s)
- Andrey V Kajava
- Centre de Recherches de Biochimie Macromoléculaire, CNRS, Université Montpellier 1 et 2, 1919 Route de Mende, 34293 Montpellier, Cedex 5, France.
| |
Collapse
|
18
|
Daniluk P, Lesyng B. A novel method to compare protein structures using local descriptors. BMC Bioinformatics 2011; 12:344. [PMID: 21849047 PMCID: PMC3179968 DOI: 10.1186/1471-2105-12-344] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2011] [Accepted: 08/17/2011] [Indexed: 11/15/2022] Open
Abstract
Background Protein structure comparison is one of the most widely performed tasks in bioinformatics. However, currently used methods have problems with the so-called "difficult similarities", including considerable shifts and distortions of structure, sequential swaps and circular permutations. There is a demand for efficient and automated systems capable of overcoming these difficulties, which may lead to the discovery of previously unknown structural relationships. Results We present a novel method for protein structure comparison based on the formalism of local descriptors of protein structure - DEscriptor Defined Alignment (DEDAL). Local similarities identified by pairs of similar descriptors are extended into global structural alignments. We demonstrate the method's capability by aligning structures in difficult benchmark sets: curated alignments in the SISYPHUS database, as well as SISY and RIPC sets, including non-sequential and non-rigid-body alignments. On the most difficult RIPC set of sequence alignment pairs the method achieves an accuracy of 77% (the second best method tested achieves 60% accuracy). Conclusions DEDAL is fast enough to be used in whole proteome applications, and by lowering the threshold of detectable structure similarity it may shed additional light on molecular evolution processes. It is well suited to improving automatic classification of structure domains, helping analyze protein fold space, or to improving protein classification schemes. DEDAL is available online at http://bioexploratorium.pl/EP/DEDAL.
Collapse
Affiliation(s)
- Paweł Daniluk
- Faculty of Physics, Department of Biophysics and CoE BioExploratorium, University of Warsaw, Żwirki i Wigury 93, Warsaw, Poland
| | | |
Collapse
|
19
|
Petrella RJ. A versatile method for systematic conformational searches: application to CheY. J Comput Chem 2011; 32:2369-85. [PMID: 21557263 PMCID: PMC3298744 DOI: 10.1002/jcc.21817] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2010] [Revised: 03/01/2011] [Accepted: 03/20/2011] [Indexed: 12/27/2022]
Abstract
A novel molecular structure prediction method, the Z Method, is described. It provides a versatile platform for the development and use of systematic, grid-based conformational search protocols, in which statistical information (i.e., rotamers) can also be included. The Z Method generates trial structures by applying many changes of the same type to a single starting structure, thereby sampling the conformation space in an unbiased way. The method, implemented in the CHARMM program as the Z Module, is applied here to an illustrative model problem in which rigid, systematic searches are performed in a 36-dimensional conformational space that describes the relative positions of the 10 secondary structural elements of the protein CheY. A polar hydrogen representation with an implicit solvation term (EEF1) is used to evaluate successively larger fragments of the protein generated in a hierarchical build-up procedure. After a final refinement stage, and a total computational time of about two-and-a-half CPU days on AMD Opteron processors, the prediction is within 1.56 Å of the native structure. The errors in the predicted backbone dihedral angles are found to approximately cancel. Monte Carlo and simulated annealing trials on the same or smaller versions of the problem, using the same atomic model and energy terms, are shown to result in less accurate predictions. Although the problem solved here is a limited one, the findings illustrate the utility of systematic searches with atom-based models for macromolecular structure prediction and the importance of unbiased sampling in structure prediction methods.
Collapse
Affiliation(s)
- Robert J Petrella
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, USA.
| |
Collapse
|
20
|
Feng J, Li M, Huang Y, Xiao Y. Symmetric key structural residues in symmetric proteins with beta-trefoil fold. PLoS One 2010; 5:e14138. [PMID: 21152439 PMCID: PMC2994741 DOI: 10.1371/journal.pone.0014138] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2010] [Accepted: 11/04/2010] [Indexed: 11/18/2022] Open
Abstract
To understand how symmetric structures of many proteins are formed from asymmetric sequences, the proteins with two repeated beta-trefoil domains in Plant Cytotoxin B-chain family and all presently known beta-trefoil proteins are analyzed by structure-based multi-sequence alignments. The results show that all these proteins have similar key structural residues that are distributed symmetrically in their structures. These symmetric key structural residues are further analyzed in terms of inter-residues interaction numbers and B-factors. It is found that they can be distinguished from other residues and have significant propensities for structural framework. This indicates that these key structural residues may conduct the formation of symmetric structures although the sequences are asymmetric.
Collapse
Affiliation(s)
- Jianhui Feng
- Biophysics and Molecular Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Mingfeng Li
- Biophysics and Molecular Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, China
- Department of Neurobiology and Kavli Institute for Neuroscience, Yale University School of Medicine, New Haven, Connecticut, United States of America
| | - Yanzhao Huang
- Biophysics and Molecular Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, China
| | - Yi Xiao
- Biophysics and Molecular Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan, China
- * E-mail:
| |
Collapse
|
21
|
Chu CH, Lo WC, Wang HW, Hsu YC, Hwang JK, Lyu PC, Pai TW, Tang CY. Detection and alignment of 3D domain swapping proteins using angle-distance image-based secondary structural matching techniques. PLoS One 2010; 5:e13361. [PMID: 20976204 PMCID: PMC2955075 DOI: 10.1371/journal.pone.0013361] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2010] [Accepted: 09/13/2010] [Indexed: 11/18/2022] Open
Abstract
This work presents a novel detection method for three-dimensional domain swapping (DS), a mechanism for forming protein quaternary structures that can be visualized as if monomers had “opened” their “closed” structures and exchanged the opened portion to form intertwined oligomers. Since the first report of DS in the mid 1990s, an increasing number of identified cases has led to the postulation that DS might occur in a protein with an unconstrained terminus under appropriate conditions. DS may play important roles in the molecular evolution and functional regulation of proteins and the formation of depositions in Alzheimer's and prion diseases. Moreover, it is promising for designing auto-assembling biomaterials. Despite the increasing interest in DS, related bioinformatics methods are rarely available. Owing to a dramatic conformational difference between the monomeric/closed and oligomeric/open forms, conventional structural comparison methods are inadequate for detecting DS. Hence, there is also a lack of comprehensive datasets for studying DS. Based on angle-distance (A-D) image transformations of secondary structural elements (SSEs), specific patterns within A-D images can be recognized and classified for structural similarities. In this work, a matching algorithm to extract corresponding SSE pairs from A-D images and a novel DS score have been designed and demonstrated to be applicable to the detection of DS relationships. The Matthews correlation coefficient (MCC) and sensitivity of the proposed DS-detecting method were higher than 0.81 even when the sequence identities of the proteins examined were lower than 10%. On average, the alignment percentage and root-mean-square distance (RMSD) computed by the proposed method were 90% and 1.8Å for a set of 1,211 DS-related pairs of proteins. The performances of structural alignments remain high and stable for DS-related homologs with less than 10% sequence identities. In addition, the quality of its hinge loop determination is comparable to that of manual inspection. This method has been implemented as a web-based tool, which requires two protein structures as the input and then the type and/or existence of DS relationships between the input structures are determined according to the A-D image-based structural alignments and the DS score. The proposed method is expected to trigger large-scale studies of this interesting structural phenomenon and facilitate related applications.
Collapse
Affiliation(s)
- Chia-Han Chu
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
| | - Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | - Hsin-Wei Wang
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
| | - Yen-Chu Hsu
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
| | - Jenn-Kang Hwang
- Institute of Bioinformatics and Systems Biology, National Chiao Tung University, Hsinchu, Taiwan, Republic of China
| | - Ping-Chiang Lyu
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
| | - Tun-Wen Pai
- Department of Computer Science and Engineering, National Taiwan Ocean University, Keelung, Taiwan, Republic of China
- * E-mail: (T-WP); (CYT)
| | - Chuan Yi Tang
- Department of Computer Science, National Tsing Hua University, Hsinchu, Taiwan, Republic of China
- Department of Computer Science and Information Engineering, Providence University, Taichung, Taiwan, Republic of China
- * E-mail: (T-WP); (CYT)
| |
Collapse
|
22
|
Detecting internally symmetric protein structures. BMC Bioinformatics 2010; 11:303. [PMID: 20525292 PMCID: PMC2894822 DOI: 10.1186/1471-2105-11-303] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2010] [Accepted: 06/03/2010] [Indexed: 11/30/2022] Open
Abstract
Background Many functional proteins have a symmetric structure. Most of these are multimeric complexes, which are made of non-symmetric monomers arranged in a symmetric manner. However, there are also a large number of proteins that have a symmetric structure in the monomeric state. These internally symmetric proteins are interesting objects from the point of view of their folding, function, and evolution. Most algorithms that detect the internally symmetric proteins depend on finding repeating units of similar structure and do not use the symmetry information. Results We describe a new method, called SymD, for detecting symmetric protein structures. The SymD procedure works by comparing the structure to its own copy after the copy is circularly permuted by all possible number of residues. The procedure is relatively insensitive to symmetry-breaking insertions and deletions and amplifies positive signals from symmetry. It finds 70% to 80% of the TIM barrel fold domains in the ASTRAL 40 domain database and 100% of the beta-propellers as symmetric. More globally, 10% to 15% of the proteins in the ASTRAL 40 domain database may be considered symmetric according to this procedure depending on the precise cutoff value used to measure the degree of perfection of the symmetry. Symmetrical proteins occur in all structural classes and can have a closed, circular structure, a cylindrical barrel-like structure, or an open, helical structure. Conclusions SymD is a sensitive procedure for detecting internally symmetric protein structures. Using this procedure, we estimate that 10% to 15% of the known protein domains may be considered symmetric. We also report an initial, overall view of the types of symmetries and symmetric folds that occur in the protein domain structure universe.
Collapse
|
23
|
Abstract
A web service for analysis of protein structures that are sequentially or non-sequentially similar was generated. Recently, the non-sequential structure alignment algorithm GANGSTA+ was introduced. GANGSTA+ can detect non-sequential structural analogs for proteins stated to possess novel folds. Since GANGSTA+ ignores the polypeptide chain connectivity of secondary structure elements (i.e. α-helices and β-strands), it is able to detect structural similarities also between proteins whose sequences were reshuffled during evolution. GANGSTA+ was applied in an all-against-all comparison on the ASTRAL40 database (SCOP version 1.75), which consists of >10 000 protein domains yielding about 55 × 106 possible protein structure alignments. Here, we provide the resulting protein structure alignments as a public web-based service, named GANGSTA+ Internet Services (GIS). We also allow to browse the ASTRAL40 database of protein structures with GANGSTA+ relative to an externally given protein structure using different constraints to select specific results. GIS allows us to analyze protein structure families according to the SCOP classification scheme. Additionally, users can upload their own protein structures for pairwise protein structure comparison, alignment against all protein structures of the ASTRAL40 database (SCOP version 1.75) or symmetry analysis. GIS is publicly available at http://agknapp.chemie.fu-berlin.de/gplus.
Collapse
Affiliation(s)
- Aysam Guerler
- Freie Universität Berlin, Institute of Chemistry and Biochemistry, Fabeckstrasse 36a, 14195 Berlin, Germany
| | | |
Collapse
|
24
|
Abstract
BACKGROUND Protein structure comparison is a fundamental task in structural biology. While the number of known protein structures has grown rapidly over the last decade, searching a large database of protein structures is still relatively slow using existing methods. There is a need for new techniques which can rapidly compare protein structures, whilst maintaining high matching accuracy. RESULTS We have developed IR Tableau, a fast protein comparison algorithm, which leverages the tableau representation to compare protein tertiary structures. IR tableau compares tableaux using information retrieval style feature indexing techniques. Experimental analysis on the ASTRAL SCOP protein structural domain database demonstrates that IR Tableau achieves two orders of magnitude speedup over the search times of existing methods, while producing search results of comparable accuracy. CONCLUSION We show that it is possible to obtain very significant speedups for the protein structure comparison problem, by employing an information retrieval style approach for indexing proteins. The comparison accuracy achieved is also strong, thus opening the way for large scale processing of very large protein structure databases.
Collapse
|
25
|
Chen H, Huang Y, Xiao Y. A simple method of identifying symmetric substructures of proteins. Comput Biol Chem 2008; 33:100-7. [PMID: 18782681 DOI: 10.1016/j.compbiolchem.2008.07.026] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2007] [Revised: 07/10/2008] [Accepted: 07/15/2008] [Indexed: 10/21/2022]
Abstract
Accurate identifications of internal symmetric substructures of proteins are needed in protein evolution study and protein design. To overcome the difficulties met by previous methods, here we propose a simple quantitative one by using a similarity matrix plus Pearson's correlation analysis. The distance root-mean-square deviation (dRMSD) is used to measure the similarity of two substructures in a protein. We applied this method to the proteins of the beta-propeller, jelly roll, and beta-trefoil families and the results show that this method cannot only detect the internal repetitive structures in proteins effectively, but also can identify their locations easily.
Collapse
Affiliation(s)
- Hanlin Chen
- Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | | | |
Collapse
|
26
|
Guerler A, Knapp EW. Novel protein folds and their nonsequential structural analogs. Protein Sci 2008; 17:1374-82. [PMID: 18583523 DOI: 10.1110/ps.035469.108] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Newly determined protein structures are classified to belong to a new fold, if the structures are sufficiently dissimilar from all other so far known protein structures. To analyze structural similarities of proteins, structure alignment tools are used. We demonstrate that the usage of nonsequential structure alignment tools, which neglect the polypeptide chain connectivity, can yield structure alignments with significant similarities between proteins of known three-dimensional structure and newly determined protein structures that possess a new fold. The recently introduced protein structure alignment tool, GANGSTA, is specialized to perform nonsequential alignments with proper assignment of the secondary structure types by focusing on helices and strands only. In the new version, GANGSTA+, the underlying algorithms were completely redesigned, yielding enhanced quality of structure alignments, offering alignment against a larger database of protein structures, and being more efficient. We applied DaliLite, TM-align, and GANGSTA+ on three protein crystal structures considered to be novel folds. Applying GANGSTA+ to these novel folds, we find proteins in the ASTRAL40 database, which possess significant structural similarities, albeit the alignments are nonsequential and in some cases involve secondary structure elements aligned in reverse orientation. A web server is available at http://agknapp.chemie.fu-berlin.de/gplus for pairwise alignment, visualization, and database comparison.
Collapse
Affiliation(s)
- Aysam Guerler
- Department of Chemistry and Biochemistry, Freie Universität Berlin, 14195 Berlin, Germany
| | | |
Collapse
|
27
|
Huang Y, Xiao Y. Detection of gene duplication signals of Ig folds from their amino acid sequences. Proteins 2007; 68:267-72. [PMID: 17427227 DOI: 10.1002/prot.21330] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Protein folds may evolve from short peptide ancestors via gene duplication and fusion. For proteins with internal structural symmetry, this means that their sequences should be made up of identical repeats. However, many of these repeat signals can only be seen at the structural level yet. Motivated by the fact that proteins may have similar structures if their sequences have more than 25% identical amino acids, we suggest a method to detect the sequence repeats of proteins directly from their sequences. Using this method, we show that the internal repetitions of the immunoglobulin folds could be identified directly at the sequence level.
Collapse
Affiliation(s)
- Yanzhao Huang
- Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | |
Collapse
|
28
|
Abyzov A, Ilyin VA. A comprehensive analysis of non-sequential alignments between all protein structures. BMC STRUCTURAL BIOLOGY 2007; 7:78. [PMID: 18005453 PMCID: PMC2213659 DOI: 10.1186/1472-6807-7-78] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Accepted: 11/16/2007] [Indexed: 05/02/2023]
Abstract
Background The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank. Results The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe. Conclusion The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.
Collapse
Affiliation(s)
- Alexej Abyzov
- Department of Biology, Northeastern University 360 Huntington Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
29
|
Dundas J, Binkowski TA, DasGupta B, Liang J. Topology independent protein structural alignment. BMC Bioinformatics 2007; 8:388. [PMID: 17937816 PMCID: PMC2096629 DOI: 10.1186/1471-2105-8-388] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2007] [Accepted: 10/15/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying structurally similar proteins with different chain topologies can aid studies in homology modeling, protein folding, protein design, and protein evolution. These include circular permuted protein structures, and the more general cases of non-cyclic permutations between similar structures, which are related by non-topological rearrangement beyond circular permutation. We present a method based on an approximation algorithm that finds sequence-order independent structural alignments that are close to optimal. We formulate the structural alignment problem as a special case of the maximum-weight independent set problem, and solve this computationally intensive problem approximately by iteratively solving relaxations of a corresponding integer programming problem. The resulting structural alignment is sequence order independent. Our method is also insensitive to insertions, deletions, and gaps. RESULTS Using a novel similarity score and a statistical model for significance p-value, we are able to discover previously unknown circular permuted proteins between nucleoplasmin-core protein and auxin binding protein, between aspartate rasemase and 3-dehydrogenate dehydralase, as well as between migration inhibition factor and arginine repressor which involves an additional strand-swapping. We also report the finding of non-cyclic permuted protein structures existing in nature between AML1/core binding factor and ribofalvin synthase. Our method can be used for large scale alignment of protein structures regardless of the topology. CONCLUSION The approximation algorithm introduced in this work can find good solutions for the problem of protein structure alignment. Furthermore, this algorithm can detect topological differences between two spatially similar protein structures. The alignment between MIF and the arginine repressor demonstrates our algorithm's ability to detect structural similarities even when spatial rearrangement of structural units has occurred. The effectiveness of our method is also demonstrated by the discovery of previously unknown circular permutations. In addition, we report in this study the finding of a naturally occurring non-cyclic permuted protein between AML1/Core Binding Factor chain F and riboflavin synthase chain A.
Collapse
Affiliation(s)
- Joe Dundas
- Department of Bioengineering, University of Illinois at Chicago, Chicago, IL 60607-7053, USA.
| | | | | | | |
Collapse
|
30
|
Comparative analysis of protein structure alignments. BMC STRUCTURAL BIOLOGY 2007; 7:50. [PMID: 17672887 PMCID: PMC1959231 DOI: 10.1186/1472-6807-7-50] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2007] [Accepted: 07/26/2007] [Indexed: 11/25/2022]
Abstract
Background Several methods are currently available for the comparison of protein structures. These methods have been analysed regarding the performance in the identification of structurally/evolutionary related proteins, but so far there has been less focus on the objective comparison between the alignments produced by different methods. Results We analysed and compared the structural alignments obtained by different methods using three sets of pairs of structurally related proteins. The first set corresponds to 355 pairs of remote homologous proteins according to the SCOP database (ASTRAL40 set). The second set was derived from the SISYPHUS database and includes 69 protein pairs (SISY set). The third set consists of 40 pairs that are challenging to align (RIPC set). The alignment of pairs of this set requires indels of considerable number and size and some of the proteins are related by circular permutations, show extensive conformational variability or include repetitions. Two standard methods (CE and DALI) were applied to align the proteins in the ASTRAL40 set. The extent of structural similarity identified by both methods is highly correlated and the alignments from the two methods agree on average in more than half of the aligned positions. CE, DALI, as well as four additional methods (FATCAT, MATRAS, Cα-match and SHEBA) were then compared using the SISY and RIPC sets. The accuracy of the alignments was assessed by comparison to reference alignments. The alignments generated by the different methods on average match more than half of the reference alignments in the SISY set. The alignments obtained in the more challenging RIPC set tend to differ considerably and match reference alignments less successfully than the SISY set alignments. Conclusion The alignments produced by different methods tend to agree to a considerable extent, but the agreement is lower for the more challenging pairs. The results for the comparison to reference alignments are encouraging, but also indicate that there is still room for improvement.
Collapse
|
31
|
Ji X, Chen H, Xiao Y. Hidden symmetries in the primary sequences of beta-barrel family. Comput Biol Chem 2007; 31:61-3. [PMID: 17270497 DOI: 10.1016/j.compbiolchem.2007.01.002] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2006] [Revised: 12/08/2006] [Accepted: 01/02/2007] [Indexed: 10/23/2022]
Abstract
In this paper, we analyze the symmetries of beta-barrel proteins at both structure and sequence levels by using a modified recurrent quantification analysis. It shows that the structures and sequences have the same two-fold symmetry, although the later diverged considerably. This result may be helpful to understand the mechanism of protein evolution.
Collapse
Affiliation(s)
- Xiaofeng Ji
- Biomolecular Physics and Modeling Group, Department of Physics, Huazhong University of Science and Technology, Wuhan 430074, Hubei, China
| | | | | |
Collapse
|
32
|
Abstract
BACKGROUND In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. RESULTS Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali.
Collapse
Affiliation(s)
- Sourangshu Bhattacharya
- Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore – 560012, India
| | - Chiranjib Bhattacharyya
- Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore – 560012, India
- Bioinformatics Center, Indian Institute of Science, Bangalore – 560012, India
| | - Nagasuma R Chandra
- Bioinformatics Center, Indian Institute of Science, Bangalore – 560012, India
| |
Collapse
|
33
|
Connectivity independent protein-structure alignment: a hierarchical approach. BMC Bioinformatics 2006; 7:510. [PMID: 17118190 PMCID: PMC1683948 DOI: 10.1186/1471-2105-7-510] [Citation(s) in RCA: 42] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2006] [Accepted: 11/21/2006] [Indexed: 11/13/2022] Open
Abstract
Background Protein-structure alignment is a fundamental tool to study protein function, evolution and model building. In the last decade several methods for structure alignment were introduced, but most of them ignore that structurally similar proteins can share the same spatial arrangement of secondary structure elements (SSE) but differ in the underlying polypeptide chain connectivity (non-sequential SSE connectivity). Results We perform protein-structure alignment using a two-level hierarchical approach implemented in the program GANGSTA. On the first level, pair contacts and relative orientations between SSEs (i.e. α-helices and β-strands) are maximized with a genetic algorithm (GA). On the second level residue pair contacts from the best SSE alignments are optimized. We have tested the method on visually optimized structure alignments of protein pairs (pairwise mode) and for database scans. For a given protein structure, our method is able to detect significant structural similarity of functionally important folds with non-sequential SSE connectivity. The performance for structure alignments with strictly sequential SSE connectivity is comparable to that of other structure alignment methods. Conclusion As demonstrated for several applications, GANGSTA finds meaningful protein-structure alignments independent of the SSE connectivity. GANGSTA is able to detect structural similarity of protein folds that are assigned to different superfamilies but nevertheless possess similar structures and perform related functions, even if these proteins differ in SSE connectivity.
Collapse
|
34
|
Shih ESC, Gan RCR, Hwang MJ. OPAAS: a web server for optimal, permuted, and other alternative alignments of protein structures. Nucleic Acids Res 2006; 34:W95-8. [PMID: 16845117 PMCID: PMC1538888 DOI: 10.1093/nar/gkl264] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
The large number of experimentally determined protein 3D structures is a rich resource for studying protein function and evolution, and protein structure comparison (PSC) is a key method for such studies. When comparing two protein structures, almost all currently available PSC servers report a single and sequential (i.e. topological) alignment, whereas the existence of good alternative alignments, including those involving permutations (i.e. non-sequential or non-topological alignments), is well known. We have recently developed a novel PSC method that can detect alternative alignments of statistical significance (alignment similarity P-value <10−5), including structural permutations at all levels of complexity. OPAAS, the server of this PSC method freely accessible at our website (), provides an easy-to-read hierarchical layout of output to display detailed information on all of the significant alternative alignments detected. Because these alternative alignments can offer a more complete picture on the structural, evolutionary and functional relationship between two proteins, OPAAS can be used in structural bioinformatics research to gain additional insight that is not readily provided by existing PSC servers.
Collapse
Affiliation(s)
| | | | - Ming-Jing Hwang
- To whom correspondence should be addressed. Tel: +886 2 2789 9033; Fax: +886 2 2788 7641;
| |
Collapse
|