1
|
Bliven SE, Lafita A, Rose PW, Capitani G, Prlić A, Bourne PE. Analyzing the symmetrical arrangement of structural repeats in proteins with CE-Symm. PLoS Comput Biol 2019; 15:e1006842. [PMID: 31009453 PMCID: PMC6504099 DOI: 10.1371/journal.pcbi.1006842] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2018] [Revised: 05/07/2019] [Accepted: 01/29/2019] [Indexed: 01/04/2023] Open
Abstract
Many proteins fold into highly regular and repetitive three dimensional structures. The analysis of structural patterns and repeated elements is fundamental to understand protein function and evolution. We present recent improvements to the CE-Symm tool for systematically detecting and analyzing the internal symmetry and structural repeats in proteins. In addition to the accurate detection of internal symmetry, the tool is now capable of i) reporting the type of symmetry, ii) identifying the smallest repeating unit, iii) describing the arrangement of repeats with transformation operations and symmetry axes, and iv) comparing the similarity of all the internal repeats at the residue level. CE-Symm 2.0 helps the user investigate proteins with a robust and intuitive sequence-to-structure analysis, with many applications in protein classification, functional annotation and evolutionary studies. We describe the algorithmic extensions of the method and demonstrate its applications to the study of interesting cases of protein evolution. Many protein structures show a great deal of regularity. Even within single polypeptide chains, about 25% of proteins contain self-similar repeating structures, which can be organized in ring-like symmetric arrangements or linear open repeats. The repeats are often related, and thus comparing the sequence and structure of repeats can give an idea as to the early evolutionary history of a protein family. Additionally, the conservation and divergence of repeats can lead to insights about the function of the proteins. This work describes CE-Symm 2.0, a tool for the analysis of protein symmetry. The method automatically detects internal symmetry in protein structures and produces a multiple alignment of structural repeats. The algorithm is able to detect the geometric relationships between the repeats, including cyclic, dihedral, and polyhedral symmetries, translational repeats, and cases where multiple symmetry operators are applicable in a hierarchical manner. These complex relationships can then be visualized in a graphical interface as a complete structure, as a superposition of repeats, or as a multiple alignment of the protein sequence. CE-Symm 2.0 can be systematically used for the automatic detection of internal symmetry in protein structures, or as an interactive tool for the analysis of structural repeats.
Collapse
Affiliation(s)
- Spencer E. Bliven
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Institute of Applied Simulation, Zurich University of Applied Science, Wädenswil, Switzerland
- * E-mail: (SEB), (AL)
| | - Aleix Lafita
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland
- European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridgeshire, United Kingdom
- * E-mail: (SEB), (AL)
| | - Peter W. Rose
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
- Structural Bioinformatics Laboratory, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Guido Capitani
- Laboratory of Biomolecular Research, Paul Scherrer Institute, Villigen, Switzerland
- Department of Biology, ETH Zurich, Zurich, Switzerland
| | - Andreas Prlić
- RCSB Protein Data Bank, San Diego Supercomputing Center, University of California San Diego, La Jolla, California, United States of America
| | - Philip E. Bourne
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America
- Department of Biomedical Engineering, University of Virginia, Charlottesville, Virginia, United States of America
| |
Collapse
|
2
|
Brunk E, Mih N, Monk J, Zhang Z, O’Brien EJ, Bliven SE, Chen K, Chang RL, Bourne PE, Palsson BO. Systems biology of the structural proteome. BMC Syst Biol 2016; 10:26. [PMID: 26969117 PMCID: PMC4787049 DOI: 10.1186/s12918-016-0271-6] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Accepted: 02/16/2016] [Indexed: 12/19/2022]
Abstract
BACKGROUND The success of genome-scale models (GEMs) can be attributed to the high-quality, bottom-up reconstructions of metabolic, protein synthesis, and transcriptional regulatory networks on an organism-specific basis. Such reconstructions are biochemically, genetically, and genomically structured knowledge bases that can be converted into a mathematical format to enable a myriad of computational biological studies. In recent years, genome-scale reconstructions have been extended to include protein structural information, which has opened up new vistas in systems biology research and empowered applications in structural systems biology and systems pharmacology. RESULTS Here, we present the generation, application, and dissemination of genome-scale models with protein structures (GEM-PRO) for Escherichia coli and Thermotoga maritima. We show the utility of integrating molecular scale analyses with systems biology approaches by discussing several comparative analyses on the temperature dependence of growth, the distribution of protein fold families, substrate specificity, and characteristic features of whole cell proteomes. Finally, to aid in the grand challenge of big data to knowledge, we provide several explicit tutorials of how protein-related information can be linked to genome-scale models in a public GitHub repository ( https://github.com/SBRG/GEMPro/tree/master/GEMPro_recon/). CONCLUSIONS Translating genome-scale, protein-related information to structured data in the format of a GEM provides a direct mapping of gene to gene-product to protein structure to biochemical reaction to network states to phenotypic function. Integration of molecular-level details of individual proteins, such as their physical, chemical, and structural properties, further expands the description of biochemical network-level properties, and can ultimately influence how to model and predict whole cell phenotypes as well as perform comparative systems biology approaches to study differences between organisms. GEM-PRO offers insight into the physical embodiment of an organism's genotype, and its use in this comparative framework enables exploration of adaptive strategies for these organisms, opening the door to many new lines of research. With these provided tools, tutorials, and background, the reader will be in a position to run GEM-PRO for their own purposes.
Collapse
Affiliation(s)
- Elizabeth Brunk
- />Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093 USA
- />Joint BioEnergy Institute, Emeryville, CA 94608 USA
| | - Nathan Mih
- />Bioinformatics and Systems Biology Program, University of California, La Jolla, San Diego, CA 92093 USA
| | - Jonathan Monk
- />Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093 USA
| | - Zhen Zhang
- />Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093 USA
| | - Edward J. O’Brien
- />Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093 USA
| | - Spencer E. Bliven
- />Bioinformatics and Systems Biology Program, University of California, La Jolla, San Diego, CA 92093 USA
- />National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894 USA
| | - Ke Chen
- />Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093 USA
| | - Roger L. Chang
- />Department of Systems Biology, Harvard Medical School, Boston, MA 02115 USA
| | - Philip E. Bourne
- />Office of the Director, National Institutes of Health, Bethesda, MD 20894 USA
| | - Bernhard O. Palsson
- />Department of Bioengineering, University of California, La Jolla, San Diego, CA 92093 USA
| |
Collapse
|
3
|
Bliven SE, Bourne PE, Prlić A. Detection of circular permutations within protein structures using CE-CP. Bioinformatics 2014; 31:1316-8. [PMID: 25505094 DOI: 10.1093/bioinformatics/btu823] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 12/08/2014] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Circular permutation is an important type of protein rearrangement. Natural circular permutations have implications for protein function, stability and evolution. Artificial circular permutations have also been used for protein studies. However, such relationships are difficult to detect for many sequence and structure comparison algorithms and require special consideration. RESULTS We developed a new algorithm, called Combinatorial Extension for Circular Permutations (CE-CP), which allows the structural comparison of circularly permuted proteins. CE-CP was designed to be user friendly and is integrated into the RCSB Protein Data Bank. It was tested on two collections of circularly permuted proteins. Pairwise alignments can be visualized both in a desktop application or on the web using Jmol and exported to other programs in a variety of formats. AVAILABILITY AND IMPLEMENTATION The CE-CP algorithm can be accessed through the RCSB website at http://www.rcsb.org/pdb/workbench/workbench.do. Source code is available under the LGPL 2.1 as part of BioJava 3 (http://biojava.org; http://github.com/biojava/biojava). CONTACT sbliven@ucsd.edu or info@rcsb.org.
Collapse
Affiliation(s)
- Spencer E Bliven
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Philip E Bourne
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Andreas Prlić
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
4
|
Myers-Turnbull D, Bliven SE, Rose PW, Aziz ZK, Youkharibache P, Bourne PE, Prlić A. Systematic detection of internal symmetry in proteins using CE-Symm. J Mol Biol 2014; 426:2255-68. [PMID: 24681267 DOI: 10.1016/j.jmb.2014.03.010] [Citation(s) in RCA: 41] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2013] [Revised: 03/17/2014] [Accepted: 03/18/2014] [Indexed: 11/26/2022]
Abstract
Symmetry is an important feature of protein tertiary and quaternary structures that has been associated with protein folding, function, evolution, and stability. Its emergence and ensuing prevalence has been attributed to gene duplications, fusion events, and subsequent evolutionary drift in sequence. This process maintains structural similarity and is further supported by this study. To further investigate the question of how internal symmetry evolved, how symmetry and function are related, and the overall frequency of internal symmetry, we developed an algorithm, CE-Symm, to detect pseudo-symmetry within the tertiary structure of protein chains. Using a large manually curated benchmark of 1007 protein domains, we show that CE-Symm performs significantly better than previous approaches. We use CE-Symm to build a census of symmetry among domain superfamilies in SCOP and note that 18% of all superfamilies are pseudo-symmetric. Our results indicate that more domains are pseudo-symmetric than previously estimated. We establish a number of recurring types of symmetry-function relationships and describe several characteristic cases in detail. With the use of the Enzyme Commission classification, symmetry was found to be enriched in some enzyme classes but depleted in others. CE-Symm thus provides a methodology for a more complete and detailed study of the role of symmetry in tertiary protein structure [availability: CE-Symm can be run from the Web at http://source.rcsb.org/jfatcatserver/symmetry.jsp. Source code and software binaries are also available under the GNU Lesser General Public License (version 2.1) at https://github.com/rcsb/symmetry. An interactive census of domains identified as symmetric by CE-Symm is available from http://source.rcsb.org/jfatcatserver/scopResults.jsp].
Collapse
Affiliation(s)
- Douglas Myers-Turnbull
- Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Spencer E Bliven
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla, CA 92093, USA
| | - Peter W Rose
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA
| | - Zaid K Aziz
- Department of Chemistry and Biochemistry, University of California San Diego, La Jolla, CA 92093, USA
| | | | - Philip E Bourne
- Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, CA 92093, USA.
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
| |
Collapse
|
5
|
Prlić A, Yates A, Bliven SE, Rose PW, Jacobsen J, Troshin PV, Chapman M, Gao J, Koh CH, Foisy S, Holland R, Rimsa G, Heuer ML, Brandstätter-Müller H, Bourne PE, Willis S. BioJava: an open-source framework for bioinformatics in 2012. Bioinformatics 2012; 28:2693-5. [PMID: 22877863 PMCID: PMC3467744 DOI: 10.1093/bioinformatics/bts494] [Citation(s) in RCA: 149] [Impact Index Per Article: 12.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Motivation: BioJava is an open-source project for processing of biological data in the Java programming language. We have recently released a new version (3.0.5), which is a major update to the code base that greatly extends its functionality. Results: BioJava now consists of several independent modules that provide state-of-the-art tools for protein structure comparison, pairwise and multiple sequence alignments, working with DNA and protein sequences, analysis of amino acid properties, detection of protein modifications and prediction of disordered regions in proteins as well as parsers for common file formats using a biologically meaningful data model. Availability: BioJava is an open-source project distributed under the Lesser GPL (LGPL). BioJava can be downloaded from the BioJava website (http://www.biojava.org). BioJava requires Java 1.6 or higher. All inquiries should be directed to the BioJava mailing lists. Details are available at http://biojava.org/wiki/BioJava:MailingLists Contact: andreas.prlic@gmail.com
Collapse
Affiliation(s)
- Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, CA 92093, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|