1
|
Gaschignard G, Millet M, Bruley A, Benzerara K, Dezi M, Skouri-Panet F, Duprat E, Callebaut I. AlphaFold2-guided description of CoBaHMA, a novel family of bacterial domains within the heavy-metal-associated superfamily. Proteins 2024; 92:776-794. [PMID: 38258321 DOI: 10.1002/prot.26668] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2023] [Revised: 12/22/2023] [Accepted: 01/01/2024] [Indexed: 01/24/2024]
Abstract
Three-dimensional (3D) structure information, now available at the proteome scale, may facilitate the detection of remote evolutionary relationships in protein superfamilies. Here, we illustrate this with the identification of a novel family of protein domains related to the ferredoxin-like superfold, by combining (i) transitive sequence similarity searches, (ii) clustering approaches, and (iii) the use of AlphaFold2 3D structure models. Domains of this family were initially identified in relation with the intracellular biomineralization of calcium carbonates by Cyanobacteria. They are part of the large heavy-metal-associated (HMA) superfamily, departing from the latter by specific sequence and structural features. In particular, most of them share conserved basic amino acids (hence their name CoBaHMA for Conserved Basic residues HMA), forming a positively charged surface, which is likely to interact with anionic partners. CoBaHMA domains are found in diverse modular organizations in bacteria, existing in the form of monodomain proteins or as part of larger proteins, some of which are membrane proteins involved in transport or lipid metabolism. This suggests that the CoBaHMA domains may exert a regulatory function, involving interactions with anionic lipids. This hypothesis might have a particular resonance in the context of the compartmentalization observed for cyanobacterial intracellular calcium carbonates.
Collapse
Affiliation(s)
- Geoffroy Gaschignard
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Maxime Millet
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Apolline Bruley
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Karim Benzerara
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Manuela Dezi
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Feriel Skouri-Panet
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Elodie Duprat
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| | - Isabelle Callebaut
- Sorbonne Université, Muséum National d'Histoire Naturelle, UMR CNRS 7590, Institut de Minéralogie, de Physique des Matériaux et de Cosmochimie, IMPMC, Paris, France
| |
Collapse
|
2
|
SeqCP: A sequence-based algorithm for searching circularly permuted proteins. Comput Struct Biotechnol J 2022; 21:185-201. [PMID: 36582435 PMCID: PMC9763678 DOI: 10.1016/j.csbj.2022.11.024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2022] [Revised: 11/10/2022] [Accepted: 11/10/2022] [Indexed: 11/16/2022] Open
Abstract
Circular permutation (CP) is a protein sequence rearrangement in which the amino- and carboxyl-termini of a protein can be created in different positions along the imaginary circularized sequence. Circularly permutated proteins usually exhibit conserved three-dimensional structures and functions. By comparing the structures of circular permutants (CPMs), protein research and bioengineering applications can be approached in ways that are difficult to achieve by traditional mutagenesis. Most current CP detection algorithms depend on structural information. Because there is a vast number of proteins with unknown structures, many CP pairs may remain unidentified. An efficient sequence-based CP detector will help identify more CP pairs and advance many protein studies. For instance, some hypothetical proteins may have CPMs with known functions and structures that are informative for functional annotation, but existing structure-based CP search methods cannot be applied when those hypothetical proteins lack structural information. Despite the considerable potential for applications, sequence-based CP search methods have not been well developed. We present a sequence-based method, SeqCP, which analyzes normal and duplicated sequence alignments to identify CPMs and determine candidate CP sites for proteins. SeqCP was trained by data obtained from the Circular Permutation Database and tested with nonredundant datasets from the Protein Data Bank. It shows high reliability in CP identification and achieves an AUC of 0.9. SeqCP has been implemented into a web server available at: http://pcnas.life.nthu.edu.tw/SeqCP/.
Collapse
Key Words
- AUC, area under the ROC curve
- CE, combinatorial extension
- CE-CP, CE with Circular Permutations
- CP, circular permutation
- CPDB, Circular Permutation Database
- CPMs, circular permutants
- CPSARST, Circular Permutation Search Aided by Ramachandran Sequential Transformation
- Circular permutants
- Circular permutation
- MCC, Matthews correlation coefficient
- Protein sequence analysis
- Protein structure modeling
- RMSD, root-mean-square distance
- ROC, receiver operating characteristic
Collapse
|
3
|
Surette MD, Spanogiannopoulos P, Wright GD. The Enzymes of the Rifamycin Antibiotic Resistome. Acc Chem Res 2021; 54:2065-2075. [PMID: 33877820 DOI: 10.1021/acs.accounts.1c00048] [Citation(s) in RCA: 16] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
Rifamycin antibiotics include the WHO essential medicines rifampin, rifabutin, and rifapentine. These are semisynthetic derivatives of the natural product rifamycins, originally isolated from the soil bacterium Amycolatopsis rifamycinica. These antibiotics are primarily used to treat mycobacterial infections, including tuberculosis. Rifamycins act by binding to the β-subunit of bacterial RNA polymerase, inhibiting transcription, which results in cell death. These antibiotics consist of a naphthalene core spanned by a polyketide ansa bridge. This structure presents a unique 3D configuration that engages RNA polymerase through a series of hydrogen bonds between hydroxyl groups linked to the naphthalene core and C21 and C23 of the ansa bridge. This binding occurs not in the enzyme active site where template-directed RNA synthesis occurs but instead in the RNA exit tunnel, thereby blocking productive formation of full-length RNA. In their clinical use to treat tuberculosis, resistance to rifamycin antibiotics arises principally from point mutations in RNA polymerase that decrease the antibiotic's affinity for the binding site in the RNA exit tunnel. In contrast, the rifamycin resistome of environmental mycobacteria and actinomycetes is much richer and diverse. In these organisms, rifamycin resistance includes many different enzymatic mechanisms that modify and alter the antibiotic directly, thereby inactivating it. These enzymes include ADP ribosyltransferases, glycosyltransferases, phosphotransferases, and monooxygenases.ADP ribosyltransferases catalyze group transfer of ADP ribose from the cofactor NAD+, which is more commonly deployed for metabolic redox reactions. ADP ribose is transferred to the hydroxyl linked to C23 of the antibiotic, thereby sterically blocking productive interaction with RNA polymerase. Like ADP ribosyltransferases, rifamycin glycosyl transferases also modify the hydroxyl of position C23 of rifamycins, transferring a glucose moiety from the donor molecule UDP-glucose. Unlike other antibiotic resistance kinases that transfer the γ-phosphate of ATP to inactivate antibiotics such as aminoglycosides or macrolides, rifamycin phosphotransferases are ATP-dependent dikinases. These enzymes transfer the β-phosphate of ATP to the C21 hydroxyl of the rifamycin ansa bridge. The result is modification of a critical RNA polymerase binding group that blocks productive complex formation. On the other hand, rifamycin monooxygenases are FAD-dependent enzymes that hydroxylate the naphthoquinone core. The result of this modification is untethering of the ansa chain from the naphthyl moiety, disrupting the essential 3D shape necessary for productive RNA polymerase binding and inhibition that leads to cell death.All of these enzymes have homologues in bacterial metabolism that either are their direct precursors or share common ancestors to the resistance enzyme. The diversity of these resistance mechanisms, often redundant in individual bacterial isolates, speaks to the importance of protecting RNA polymerase from these compounds and validates this enzyme as a critical antibiotic target.
Collapse
Affiliation(s)
- Matthew D. Surette
- M.G. DeGroote Institute for Infectious Disease Research, David Braley Center for Antibiotic Discovery, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario L8S 3Z5, Canada
| | - Peter Spanogiannopoulos
- M.G. DeGroote Institute for Infectious Disease Research, David Braley Center for Antibiotic Discovery, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario L8S 3Z5, Canada
| | - Gerard D. Wright
- M.G. DeGroote Institute for Infectious Disease Research, David Braley Center for Antibiotic Discovery, Department of Biochemistry and Biomedical Sciences, McMaster University, Hamilton, Ontario L8S 3Z5, Canada
| |
Collapse
|
4
|
Tenorio CA, Longo LM, Parker JB, Lee J, Blaber M. Ab initio folding of a trefoil-fold motif reveals structural similarity with a β-propeller blade motif. Protein Sci 2020; 29:1172-1185. [PMID: 32142181 DOI: 10.1002/pro.3850] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2020] [Revised: 03/01/2020] [Accepted: 03/03/2020] [Indexed: 01/05/2023]
Abstract
Many protein architectures exhibit evidence of internal rotational symmetry postulated to be the result of gene duplication/fusion events involving a primordial polypeptide motif. A common feature of such structures is a domain-swapped arrangement at the interface of the N- and C-termini motifs and postulated to provide cooperative interactions that promote folding and stability. De novo designed symmetric protein architectures have demonstrated an ability to accommodate circular permutation of the N- and C-termini in the overall architecture; however, the folding requirement of the primordial motif is poorly understood, and tolerance to circular permutation is essentially unknown. The β-trefoil protein fold is a threefold-symmetric architecture where the repeating ~42-mer "trefoil-fold" motif assembles via a domain-swapped arrangement. The trefoil-fold structure in isolation exposes considerable hydrophobic area that is otherwise buried in the intact β-trefoil trimeric assembly. The trefoil-fold sequence is not predicted to adopt the trefoil-fold architecture in ab initio folding studies; rather, the predicted fold is closely related to a compact "blade" motif from the β-propeller architecture. Expression of a trefoil-fold sequence and circular permutants shows that only the wild-type N-terminal motif definition yields an intact β-trefoil trimeric assembly, while permutants yield monomers. The results elucidate the folding requirements of the primordial trefoil-fold motif, and also suggest that this motif may sample a compact conformation that limits hydrophobic residue exposure, contains key trefoil-fold structural features, but is more structurally homologous to a β-propeller blade motif.
Collapse
Affiliation(s)
- Connie A Tenorio
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida, USA
| | - Liam M Longo
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida, USA
| | - Joseph B Parker
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida, USA
| | - Jihun Lee
- Department of Biomedical Sciences, Florida State University, Tallahassee, Florida, USA
| | | |
Collapse
|
5
|
Alvarez-Carreño C, Coello G, Arciniega M. FiRES: A computational method for the de novo identification of internal structure similarity in proteins. Proteins 2020; 88:1169-1179. [PMID: 32112578 DOI: 10.1002/prot.25886] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2019] [Revised: 11/12/2019] [Accepted: 02/24/2020] [Indexed: 11/08/2022]
Abstract
Internal structure similarity in proteins can be observed at the domain and subdomain levels. From an evolutionary perspective, structurally similar elements may arise divergently by gene duplication and fusion events but may also be the product of convergent evolution under physicochemical constraints. The characterization of proteins that contain repeated structural elements has implications for many fields of protein science including protein domain evolution, structure classification, structure prediction, and protein engineering. FiRES (Find Repeated Elements in Structure) is an algorithm that relies on a topology-independent structure alignment method to identify repeating elements in protein structure. FiRES was tested against two hand curated databases of protein repeats: MALIDUP, for very divergent duplicated domains; and RepeatsDB for short tandem repeats. The performance of FiRES was compared to that of lalign, RADAR, HHrepID, CE-symm, ReUPred, and Swelfe. FiRES was the method that most accurately detected proteins either with duplicated domains (accuracy = 0.86) or with multiple repeated units (accuracy = 0.92). FiRES is a new methodology for the discovery of proteins containing structurally similar elements. The FiRES web server is publicly available at http://fires.ifc.unam.mx. The scripts, results, and benchmarks from this study can be downloaded from https://github.com/Claualvarez/fires.
Collapse
Affiliation(s)
- Claudia Alvarez-Carreño
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico.,School of Chemistry and Biochemistry, Georgia Institute of Technology, Atlanta, Georgia, USA
| | - Gerardo Coello
- Unidad de Cómputo, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| | - Marcelino Arciniega
- Department of Bioquímica y Biología Estructural, Instituto de Fisiología Celular, Universidad Nacional Autónoma de México, Mexico City, Mexico
| |
Collapse
|
6
|
Yurkova MS, Zenin VA, Nagibina GS, Melnik BS, Fedorov AN. Physico-Chemical Characterization of Permutated Variants of Chaperone GroEL Apical Domain. APPL BIOCHEM MICRO+ 2019. [DOI: 10.1134/s0003683819130027] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
7
|
Yurkova MS, Sharapova OA, Zenin VA, Fedorov AN. Versatile format of minichaperone-based protein fusion system. Sci Rep 2019; 9:15063. [PMID: 31636289 PMCID: PMC6803692 DOI: 10.1038/s41598-019-51015-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Accepted: 09/17/2019] [Indexed: 11/23/2022] Open
Abstract
Hydrophobic recombinant proteins often tend to aggregate upon expression into inclusion bodies and are difficult to refold. Producing them in soluble forms constitutes a common bottleneck problem. A fusion system for production of insoluble hydrophobic proteins in soluble stable forms with thermophilic minichaperone, GroEL apical domain (GrAD) as a carrier, has recently been developed. To provide the utmost flexibility of the system for interactions between the carrier and various target protein moieties a strategy of making permutated protein variants by gene engineering has been applied: the original N- and C-termini of the minichaperone were linked together by a polypeptide linker and new N- and C-termini were made at desired parts of the protein surface. Two permutated GrAD forms were created and analyzed. Constructs of GrAD and both of its permutated forms fused with the initially insoluble N-terminal fragment of hepatitis C virus' E2 protein were tested. Expressed fusions formed inclusion bodies. After denaturation, all fusions were completely renatured in stable soluble forms. A variety of permutated GrAD variants can be created. The versatile format of the system provides opportunities for choosing an optimal pair between particular target protein moiety and the best-suited original or specific permutated carrier.
Collapse
Affiliation(s)
- Maria S Yurkova
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, 119071, Moscow, Russian Federation
- Tropogen Inc, Moscow, Russia
| | - Olga A Sharapova
- Alder BioPharmaceuticals, Inc., 11804 N Creek Pkwy S, Bothell, WA, 98011, USA
| | - Vladimir A Zenin
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, 119071, Moscow, Russian Federation
| | - Alexey N Fedorov
- Bach Institute of Biochemistry, Research Center of Biotechnology of the Russian Academy of Sciences, 119071, Moscow, Russian Federation.
- Tropogen Inc, Moscow, Russia.
| |
Collapse
|
8
|
Atomic insights into the genesis of cellular filaments by globular proteins. Nat Struct Mol Biol 2018; 25:705-714. [PMID: 30076408 PMCID: PMC6185745 DOI: 10.1038/s41594-018-0096-7] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2017] [Accepted: 06/21/2018] [Indexed: 02/04/2023]
Abstract
Self-assembly of proteins into filaments, such as actin and tubulin filaments, underlies essential cellular processes in all three domains of life. The early emergence of filaments in evolutionary history suggests that filament genesis might be a robust process. Here we describe the fortuitous construction of GFP fusion proteins that self-assemble as fluorescent polar filaments in Escherichia coli. Filament formation is achieved by appending as few as 12 residues. Crystal structures reveal that the protomers each donate an appendage to fill a groove between two following protomers along the filament. This exchange of appendages resembles runaway domain swapping but is distinguished by higher efficiency because monomers cannot competitively bind their own appendages. Ample evidence of this “runaway domain coupling” mechanism in nature suggests it could facilitate the evolutionary pathway from globular protein to polar filament, requiring a minimal extension of protein sequence and no significant refolding.
Collapse
|
9
|
Jones AM, Mehta MM, Thomas EE, Atkinson JT, Segall-Shapiro TH, Liu S, Silberg JJ. The Structure of a Thermophilic Kinase Shapes Fitness upon Random Circular Permutation. ACS Synth Biol 2016; 5:415-25. [PMID: 26976658 DOI: 10.1021/acssynbio.5b00305] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Proteins can be engineered for synthetic biology through circular permutation, a sequence rearrangement in which native protein termini become linked and new termini are created elsewhere through backbone fission. However, it remains challenging to anticipate a protein's functional tolerance to circular permutation. Here, we describe new transposons for creating libraries of randomly circularly permuted proteins that minimize peptide additions at their termini, and we use transposase mutagenesis to study the tolerance of a thermophilic adenylate kinase (AK) to circular permutation. We find that libraries expressing permuted AKs with either short or long peptides amended to their N-terminus yield distinct sets of active variants and present evidence that this trend arises because permuted protein expression varies across libraries. Mapping all sites that tolerate backbone cleavage onto AK structure reveals that the largest contiguous regions of sequence that lack cleavage sites are proximal to the phosphotransfer site. A comparison of our results with a range of structure-derived parameters further showed that retention of function correlates to the strongest extent with the distance to the phosphotransfer site, amino acid variability in an AK family sequence alignment, and residue-level deviations in superimposed AK structures. Our work illustrates how permuted protein libraries can be created with minimal peptide additions using transposase mutagenesis, and it reveals a challenge of maintaining consistent expression across permuted variants in a library that minimizes peptide additions. Furthermore, these findings provide a basis for interpreting responses of thermophilic phosphotransferases to circular permutation by calibrating how different structure-derived parameters relate to retention of function in a cellular selection.
Collapse
Affiliation(s)
- Alicia M. Jones
- Department
of Biosciences, Rice University, MS-140, 6100 Main Street, Houston, Texas 77005, United States
| | - Manan M. Mehta
- Medical
Scientist Training Program, Northwestern University, 303 East
Chicago Avenue, Morton 1-670, Chicago, Illinois 60611, United States
| | - Emily E. Thomas
- Department
of Biosciences, Rice University, MS-140, 6100 Main Street, Houston, Texas 77005, United States
| | - Joshua T. Atkinson
- Systems,
Synthetic, and Physical Biology Graduate Program, Rice University, 6100
Main MS-180, Houston, Texas 77005, United States
| | - Thomas H. Segall-Shapiro
- Department
of Biological Engineering, Synthetic Biology Center, Massachusetts Institute of Technology, 500 Technology Square, NE47-257, Cambridge, Massachusetts 02139, United States
| | - Shirley Liu
- Department
of Biosciences, Rice University, MS-140, 6100 Main Street, Houston, Texas 77005, United States
| | - Jonathan J. Silberg
- Department
of Biosciences, Rice University, MS-140, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
10
|
Tian P, Best RB. Structural Determinants of Misfolding in Multidomain Proteins. PLoS Comput Biol 2016; 12:e1004933. [PMID: 27163669 PMCID: PMC4862688 DOI: 10.1371/journal.pcbi.1004933] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2016] [Accepted: 04/21/2016] [Indexed: 12/02/2022] Open
Abstract
Recent single molecule experiments, using either atomic force microscopy (AFM) or Förster resonance energy transfer (FRET) have shown that multidomain proteins containing tandem repeats may form stable misfolded structures. Topology-based simulation models have been used successfully to generate models for these structures with domain-swapped features, fully consistent with the available data. However, it is also known that some multidomain protein folds exhibit no evidence for misfolding, even when adjacent domains have identical sequences. Here we pose the question: what factors influence the propensity of a given fold to undergo domain-swapped misfolding? Using a coarse-grained simulation model, we can reproduce the known propensities of multidomain proteins to form domain-swapped misfolds, where data is available. Contrary to what might be naively expected based on the previously described misfolding mechanism, we find that the extent of misfolding is not determined by the relative folding rates or barrier heights for forming the domains present in the initial intermediates leading to folded or misfolded structures. Instead, it appears that the propensity is more closely related to the relative stability of the domains present in folded and misfolded intermediates. We show that these findings can be rationalized if the folded and misfolded domains are part of the same folding funnel, with commitment to one structure or the other occurring only at a relatively late stage of folding. Nonetheless, the results are still fully consistent with the kinetic models previously proposed to explain misfolding, with a specific interpretation of the observed rate coefficients. Finally, we investigate the relation between interdomain linker length and misfolding, and propose a simple alchemical model to predict the propensity for domain-swapped misfolding of multidomain proteins.
Collapse
Affiliation(s)
- Pengfei Tian
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Robert B. Best
- Laboratory of Chemical Physics, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
11
|
Adjeroh D, Jiang Y, Jiang BH, Lin J. Network analysis of circular permutations in multidomain proteins reveals functional linkages for uncharacterized proteins. Cancer Inform 2015; 13:109-24. [PMID: 25741177 PMCID: PMC4338801 DOI: 10.4137/cin.s14059] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2014] [Revised: 09/23/2014] [Accepted: 09/24/2014] [Indexed: 01/19/2023] Open
Abstract
Various studies have implicated different multidomain proteins in cancer. However, there has been little or no detailed study on the role of circular multidomain proteins in the general problem of cancer or on specific cancer types. This work represents an initial attempt at investigating the potential for predicting linkages between known cancer-associated proteins with uncharacterized or hypothetical multidomain proteins, based primarily on circular permutation (CP) relationships. First, we propose an efficient algorithm for rapid identification of both exact and approximate CPs in multidomain proteins. Using the circular relations identified, we construct networks between multidomain proteins, based on which we perform functional annotation of multidomain proteins. We then extend the method to construct subnetworks for selected cancer subtypes, and performed prediction of potential link-ages between uncharacterized multidomain proteins and the selected cancer types. We include practical results showing the performance of the proposed methods.
Collapse
Affiliation(s)
- Donald Adjeroh
- Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA
| | - Yue Jiang
- Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China
| | - Bing-Hua Jiang
- Pathology, Anatomy and Cell Biology, Thomas Jefferson University, Philadelphia, PA, USA
| | - Jie Lin
- Faculty of Software, Fujian Normal University, Fuzhou, Fujian, China
| |
Collapse
|
12
|
Bliven SE, Bourne PE, Prlić A. Detection of circular permutations within protein structures using CE-CP. Bioinformatics 2014; 31:1316-8. [PMID: 25505094 DOI: 10.1093/bioinformatics/btu823] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2014] [Accepted: 12/08/2014] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION Circular permutation is an important type of protein rearrangement. Natural circular permutations have implications for protein function, stability and evolution. Artificial circular permutations have also been used for protein studies. However, such relationships are difficult to detect for many sequence and structure comparison algorithms and require special consideration. RESULTS We developed a new algorithm, called Combinatorial Extension for Circular Permutations (CE-CP), which allows the structural comparison of circularly permuted proteins. CE-CP was designed to be user friendly and is integrated into the RCSB Protein Data Bank. It was tested on two collections of circularly permuted proteins. Pairwise alignments can be visualized both in a desktop application or on the web using Jmol and exported to other programs in a variety of formats. AVAILABILITY AND IMPLEMENTATION The CE-CP algorithm can be accessed through the RCSB website at http://www.rcsb.org/pdb/workbench/workbench.do. Source code is available under the LGPL 2.1 as part of BioJava 3 (http://biojava.org; http://github.com/biojava/biojava). CONTACT sbliven@ucsd.edu or info@rcsb.org.
Collapse
Affiliation(s)
- Spencer E Bliven
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Philip E Bourne
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| | - Andreas Prlić
- Bioinformatics and Systems Biology Program, University of California, San Diego, La Jolla, CA 92093, USA, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA and RCSB Protein Data Bank, San Diego Supercomputer Center, University of California, San Diego, La Jolla, CA 92093, USA
| |
Collapse
|
13
|
Longo LM, Blaber M. Symmetric protein architecture in protein design: top-down symmetric deconstruction. Methods Mol Biol 2014; 1216:161-182. [PMID: 25213415 DOI: 10.1007/978-1-4939-1486-9_8] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Top-down symmetric deconstruction (TDSD) is a joint experimental and computational approach to generate a highly stable, functionally benign protein scaffold for intended application in subsequent functional design studies. By focusing on symmetric protein folds, TDSD can leverage the dramatic reduction in sequence space achieved by applying a primary structure symmetric constraint to the design process. Fundamentally, TDSD is an iterative symmetrization process, in which the goal is to maintain or improve properties of thermodynamic stability and folding cooperativity inherent to a starting sequence (the "proxy"). As such, TDSD does not attempt to solve the inverse protein folding problem directly, which is computationally intractable. The present chapter will take the reader through all of the primary steps of TDSD-selecting a proxy, identifying potential mutations, establishing a stability/folding cooperativity screen-relying heavily on a successful TDSD solution for the common β-trefoil fold.
Collapse
Affiliation(s)
- Liam M Longo
- Department of Biomedical Sciences, College of Medicine, Florida State University, 1115 West Call Street, Tallahassee, FL, 32306-4300, USA
| | | |
Collapse
|
14
|
Longo L, Lee J, Tenorio C, Blaber M. Alternative Folding Nuclei Definitions Facilitate the Evolution of a Symmetric Protein Fold from a Smaller Peptide Motif. Structure 2013; 21:2042-50. [DOI: 10.1016/j.str.2013.09.003] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Revised: 09/09/2013] [Accepted: 09/11/2013] [Indexed: 11/25/2022]
|
15
|
Minami S, Sawada K, Chikenji G. MICAN: a protein structure alignment algorithm that can handle Multiple-chains, Inverse alignments, C(α) only models, Alternative alignments, and Non-sequential alignments. BMC Bioinformatics 2013; 14:24. [PMID: 23331634 PMCID: PMC3637537 DOI: 10.1186/1471-2105-14-24] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 01/08/2013] [Indexed: 11/10/2022] Open
Abstract
Background Protein pairs that have the same secondary structure packing arrangement but have different topologies have attracted much attention in terms of both evolution and physical chemistry of protein structures. Further investigation of such protein relationships would give us a hint as to how proteins can change their fold in the course of evolution, as well as a insight into physico-chemical properties of secondary structure packing. For this purpose, highly accurate sequence order independent structure comparison methods are needed. Results We have developed a novel protein structure alignment algorithm, MICAN (a structure alignment algorithm that can handle Multiple-chain complexes, Inverse direction of secondary structures, Cα only models, Alternative alignments, and Non-sequential alignments). The algorithm was designed so as to identify the best structural alignment between protein pairs by disregarding the connectivity between secondary structure elements (SSE). One of the key feature of the algorithm is utilizing the multiple vector representation for each SSE, which enables us to correctly treat bent or twisted nature of long SSE. We compared MICAN with other 9 publicly available structure alignment programs, using both reference-dependent and reference-independent evaluation methods on a variety of benchmark test sets which include both sequential and non-sequential alignments. We show that MICAN outperforms the other existing methods for reproducing reference alignments of non-sequential test sets. Further, although MICAN does not specialize in sequential structure alignment, it showed the top level performance on the sequential test sets. We also show that MICAN program is the fastest non-sequential structure alignment program among all the programs we examined here. Conclusions MICAN is the fastest and the most accurate program among non-sequential alignment programs we examined here. These results suggest that MICAN is a highly effective tool for automatically detecting non-trivial structural relationships of proteins, such as circular permutations and segment-swapping, many of which have been identified manually by human experts so far. The source code of MICAN is freely download-able at http://www.tbp.cse.nagoya-u.ac.jp/MICAN.
Collapse
Affiliation(s)
- Shintaro Minami
- Department of Computational Science and Engineering, Nagoya University, Nagoya 464-8603, Japan
| | | | | |
Collapse
|
16
|
Debès C, Wang M, Caetano-Anollés G, Gräter F. Evolutionary optimization of protein folding. PLoS Comput Biol 2013; 9:e1002861. [PMID: 23341762 PMCID: PMC3547816 DOI: 10.1371/journal.pcbi.1002861] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/03/2012] [Accepted: 11/09/2012] [Indexed: 01/07/2023] Open
Abstract
Nature has shaped the make up of proteins since their appearance, 3.8 billion years ago. However, the fundamental drivers of structural change responsible for the extraordinary diversity of proteins have yet to be elucidated. Here we explore if protein evolution affects folding speed. We estimated folding times for the present-day catalog of protein domains directly from their size-modified contact order. These values were mapped onto an evolutionary timeline of domain appearance derived from a phylogenomic analysis of protein domains in 989 fully-sequenced genomes. Our results show a clear overall increase of folding speed during evolution, with known ultra-fast downhill folders appearing rather late in the timeline. Remarkably, folding optimization depends on secondary structure. While alpha-folds showed a tendency to fold faster throughout evolution, beta-folds exhibited a trend of folding time increase during the last 1.5 billion years that began during the “big bang” of domain combinations. As a consequence, these domain structures are on average slow folders today. Our results suggest that fast and efficient folding of domains shaped the universe of protein structure. This finding supports the hypothesis that optimization of the kinetic and thermodynamic accessibility of the native fold reduces protein aggregation propensities that hamper cellular functions. Nature has come up with an enormous variety of protein three-dimensional structures, each of which is thought to be optimized for its specific function. A fundamental biological endeavor is to uncover the driving evolutionary forces for discovering and optimizing new folds. A long-standing hypothesis is that fold evolution obeys constraints to properly fold into native structure. We here test this hypothesis by analyzing trends of proteins to fold fast during evolution. Using phylogenomic and structural analyses, we observe an overall decrease in folding times between 3.8 and 1.5 billion years ago, which can be interpreted as an evolutionary optimization for rapid folding. This trend towards fast folding probably resulted in manifold advantages, including high protein accessibility for the cell and a reduction of protein aggregation during misfolding.
Collapse
Affiliation(s)
- Cédric Debès
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
| | - Minglei Wang
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
| | - Gustavo Caetano-Anollés
- Evolutionary Bioinformatics Laboratory, Department of Crop Sciences, University of Illinois, Urbana, Illinois, United States of America
- * E-mail: (GCA); (FG)
| | - Frauke Gräter
- Heidelberg Institute for Theoretical Studies, Heidelberg, Germany
- CAS-MPG Partner Institute and Key Laboratory for Computational Biology, Shanghai, China
- * E-mail: (GCA); (FG)
| |
Collapse
|
17
|
Sasidharan R, Nepusz T, Swarbreck D, Huala E, Paccanaro A. GFam: a platform for automatic annotation of gene families. Nucleic Acids Res 2012; 40:e152. [PMID: 22790981 PMCID: PMC3479161 DOI: 10.1093/nar/gks631] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam’s capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from http://www.paccanarolab.org/software/gfam/.
Collapse
Affiliation(s)
- Rajkumar Sasidharan
- Department of Molecular, Cell and Developmental Biology, University of California at Los Angeles, Los Angeles, CA 90095, USA.
| | | | | | | | | |
Collapse
|
18
|
Affiliation(s)
- Spencer Bliven
- Bioinformatics Program, University of California, San Diego, La Jolla, California, United States of America
- * E-mail: (SB); (AP)
| | - Andreas Prlić
- San Diego Supercomputer Center, University of California San Diego, La Jolla, California, United States of America
- * E-mail: (SB); (AP)
| |
Collapse
|
19
|
Hoogewijs D, Ebner B, Germani F, Hoffmann FG, Fabrizius A, Moens L, Burmester T, Dewilde S, Storz JF, Vinogradov SN, Hankeln T. Androglobin: a chimeric globin in metazoans that is preferentially expressed in Mammalian testes. Mol Biol Evol 2011; 29:1105-14. [PMID: 22115833 DOI: 10.1093/molbev/msr246] [Citation(s) in RCA: 89] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Comparative genomic studies have led to the recent identification of several novel globin types in the Metazoa. They have revealed a surprising evolutionary diversity of functions beyond the familiar O(2) supply roles of hemoglobin and myoglobin. Here we report the discovery of a hitherto unrecognized family of proteins with a unique modular architecture, possessing an N-terminal calpain-like domain, an internal, circular permuted globin domain, and an IQ calmodulin-binding motif. Putative orthologs are present in the genomes of many metazoan taxa, including vertebrates. The calpain-like region is homologous to the catalytic domain II of the large subunit of human calpain-7. The globin domain satisfies the criteria of a myoglobin-like fold but is rearranged and split into two parts. The recombinantly expressed human globin domain exhibits an absorption spectrum characteristic of hexacoordination of the heme iron atom. Molecular evolutionary analyses indicate that this chimeric globin family is phylogenetically ancient and originated in the common ancestor to animals and choanoflagellates. In humans and mice, the gene is predominantly expressed in testis tissue, and we propose the name "androglobin" (Adgb). Expression is associated with postmeiotic stages of spermatogenesis and is insensitive to experimental hypoxia. Evidence exists for increased gene expression in fertile compared with infertile males.
Collapse
Affiliation(s)
- David Hoogewijs
- Institute of Physiology and Zürich Center for Integrative Human Physiology, University of Zürich, Zürich, Switzerland
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Crystal structure of glucansucrase from the dental caries pathogen Streptococcus mutans. J Mol Biol 2011; 408:177-86. [PMID: 21354427 DOI: 10.1016/j.jmb.2011.02.028] [Citation(s) in RCA: 111] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2010] [Revised: 02/03/2011] [Accepted: 02/08/2011] [Indexed: 11/23/2022]
Abstract
Glucansucrase (GSase) from Streptococcus mutans is an essential agent in dental caries pathogenesis. Here, we report the crystal structure of S. mutans glycosyltransferase (GTF-SI), which synthesizes soluble and insoluble glucans and is a glycoside hydrolase (GH) family 70 GSase in the free enzyme form and in complex with acarbose and maltose. Resolution of the GTF-SI structure confirmed that the domain order of GTF-SI is circularly permuted as compared to that of GH family 13 α-amylases. As a result, domains A, B and IV of GTF-SI are each composed of two separate polypeptide chains. Structural comparison of GTF-SI and amylosucrase, which is closely related to GH family 13 amylases, indicated that the two enzymes share a similar transglycosylation mechanism via a glycosyl-enzyme intermediate in subsite -1. On the other hand, novel structural features were revealed in subsites +1 and +2 of GTF-SI. Trp517 provided the platform for glycosyl acceptor binding, while Tyr430, Asn481 and Ser589, which are conserved in family 70 enzymes but not in family 13 enzymes, comprised subsite +1. Based on the structure of GTF-SI and amino acid comparison of GTF-SI, GTF-I and GTF-S, Asp593 in GTF-SI appeared to be the most critical point for acceptor sugar orientation, influencing the transglycosylation specificity of GSases, that is, whether they produced insoluble glucan with α(1-3) glycosidic linkages or soluble glucan with α(1-6) linkages. The structural information derived from the current study should be extremely useful in the design of novel inhibitors that prevent the biofilm formation by GTF-SI.
Collapse
|
21
|
Eisenbeis S, Höcker B. Evolutionary mechanism as a template for protein engineering. J Pept Sci 2010; 16:538-44. [DOI: 10.1002/psc.1233] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
22
|
Schmidt-Goenner T, Guerler A, Kolbeck B, Knapp EW. Circular permuted proteins in the universe of protein folds. Proteins 2009; 78:1618-30. [DOI: 10.1002/prot.22678] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
23
|
Sacan A, Toroslu IH, Ferhatosmanoglu H. Integrated search and alignment of protein structures. Bioinformatics 2008; 24:2872-9. [PMID: 18945684 DOI: 10.1093/bioinformatics/btn545] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Identification and comparison of similar three-dimensional (3D) protein structures has become an even greater challenge in the face of the rapidly growing structure databases. Here, we introduce Vorometric, a new method that provides efficient search and alignment of a query protein against a database of protein structures. Voronoi contacts of the protein residues are enriched with the secondary structure information and a metric substitution matrix is developed to allow efficient indexing. The contact hits obtained from a distance-based indexing method are extended to obtain high-scoring segment pairs, which are then used to generate structural alignments. RESULTS Vorometric is the first to address both search and alignment problems in the protein structure databases. The experimental results show that Vorometric is simultaneously effective in retrieving similar protein structures, producing high-quality structure alignments, and identifying cross-fold similarities. Vorometric outperforms current structure retrieval methods in search accuracy, while requiring com-parable running times. Furthermore, the structural superpositions produced are shown to have better quality and coverage, when compared with those of the popular structure alignment tools. AVAILABILITY Vorometric is available as a web service at http://bio.cse.ohio-state.edu/Vorometric
Collapse
Affiliation(s)
- Ahmet Sacan
- Department of Computer Engineering, Middle East Technical University, Ankara, Turkey.
| | | | | |
Collapse
|
24
|
Abstract
Circular permutation (CP) in a protein can be considered as if its sequence were circularized followed by a creation of termini at a new location. Since the first observation of CP in 1979, a substantial number of studies have concluded that circular permutants (CPs) usually retain native structures and functions, sometimes with increased stability or functional diversity. Although this interesting property has made CP useful in many protein engineering and folding researches, large-scale collections of CP-related information were not available until this study. Here we describe CPDB, the first CP DataBase. The organizational principle of CPDB is a hierarchical categorization in which pairs of circular permutants are grouped into CP clusters, which are further grouped into folds and in turn classes. Additions to CPDB include a useful set of tools and resources for the identification, characterization, comparison and visualization of CP. Besides, several viable CP site prediction methods are implemented and assessed in CPDB. This database can be useful in protein folding and evolution studies, the discovery of novel protein structural and functional relationships, and facilitating the production of new CPs with unique biotechnical or industrial interests. The CPDB database can be accessed at http://sarst.life.nthu.edu.tw/cpdb
Collapse
Affiliation(s)
- Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | | | | | |
Collapse
|
25
|
Wang S, Zheng WM. CLePAPS: fast pair alignment of protein structures based on conformational letters. J Bioinform Comput Biol 2008; 6:347-66. [PMID: 18464327 DOI: 10.1142/s0219720008003461] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2007] [Revised: 11/22/2007] [Accepted: 12/05/2007] [Indexed: 11/18/2022]
Abstract
Fast, efficient, and reliable algorithms for pairwise alignment of protein structures are in ever-increasing demand for analyzing the rapidly growing data on protein structures. CLePAPS is a tool developed for this purpose. It distinguishes itself from other existing algorithms by the use of conformational letters, which are discretized states of 3D segmental structural states. A letter corresponds to a cluster of combinations of the three angles formed by Calpha pseudobonds of four contiguous residues. A substitution matrix called CLESUM is available to measure the similarity between any two such letters. CLePAPS regards an aligned fragment pair (AFP) as an ungapped string pair with a high sum of pairwise CLESUM scores. Using CLESUM scores as the similarity measure, CLePAPS searches for AFPs by simple string comparison. The transformation which best superimposes a highly similar AFP can be used to superimpose the structure pairs under comparison. A highly scored AFP which is consistent with several other AFPs determines an initial alignment. CLePAPS then joins consistent AFPs guided by their similarity scores to extend the alignment by several "zoom-in" iteration steps. A follow-up refinement produces the final alignment. CLePAPS does not implement dynamic programming. The utility of CLePAPS is tested on various protein structure pairs.
Collapse
Affiliation(s)
- Sheng Wang
- Institute of Theoretical Physics, Academia Sinica, Beijing 100080, China
| | | |
Collapse
|
26
|
Lo WC, Lyu PC. CPSARST: an efficient circular permutation search tool applied to the detection of novel protein structural relationships. Genome Biol 2008; 9:R11. [PMID: 18201387 PMCID: PMC2395249 DOI: 10.1186/gb-2008-9-1-r11] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2007] [Revised: 11/19/2007] [Accepted: 01/18/2008] [Indexed: 12/04/2022] Open
Abstract
CPSARST (Circular Permutation Search Aided by Ramachandran Sequential Transformation) is an efficient database search tool that provides a new way for rapidly detecting novel relationships among proteins. Circular permutation of a protein can be visualized as if the original amino- and carboxyl termini were linked and new ones created elsewhere. It has been well-documented that circular permutants usually retain native structures and biological functions. Here we report CPSARST (Circular Permutation Search Aided by Ramachandran Sequential Transformation) to be an efficient database search tool. In this post-genomics era, when the amount of protein structural data is increasing exponentially, it provides a new way to rapidly detect novel relationships among proteins.
Collapse
Affiliation(s)
- Wei-Cheng Lo
- Institute of Bioinformatics and Structural Biology, National Tsing Hua University, Hsinchu 30013, Taiwan
| | | |
Collapse
|
27
|
Abyzov A, Ilyin VA. A comprehensive analysis of non-sequential alignments between all protein structures. BMC STRUCTURAL BIOLOGY 2007; 7:78. [PMID: 18005453 PMCID: PMC2213659 DOI: 10.1186/1472-6807-7-78] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/14/2007] [Accepted: 11/16/2007] [Indexed: 05/02/2023]
Abstract
Background The majority of relations between proteins can be represented as a conventional sequential alignment. Nevertheless, unusual non-sequential alignments with different connectivity of the aligned fragments in compared proteins have been reported by many researchers. It is interesting to understand those non-sequential alignments; are they unique, sporadic cases or they occur frequently; do they belong to a few specific folds or spread among many different folds, as a common feature of protein structure. We present here a comprehensive large-scale study of non-sequential alignments between available protein structures in Protein Data Bank. Results The study has been conducted on a non-redundant set of 8,865 protein structures aligned with the aid of the TOPOFIT method. It has been estimated that between 17.4% and 35.2% of all alignments are non-sequential depending on variations in the parameters. Analysis of the data revealed that non-sequential relations between proteins do occur systematically and in large quantities. Various sizes and numbers of non-sequential fragments have been observed with all possible complexities of fragment rearrangements found for alignments consisting of up to 12 fragments. It has been found that non-sequential alignments are not limited to proteins of any particular fold and are present in more than two hundred of them. Moreover, many of them are found between proteins with different fold assignments. It has been shown that protein structure symmetry does not explain non-sequential alignments. Therefore, compelling evidences have been provided that non-sequential alignments between proteins are systematic and widespread across the protein universe. Conclusion The phenomenon of the widespread occurrence of non-sequential alignments between proteins might represent a missing rule of protein structure organization. More detailed study of this phenomenon will enhance our understanding of protein stability, folding, and evolution.
Collapse
Affiliation(s)
- Alexej Abyzov
- Department of Biology, Northeastern University 360 Huntington Avenue, Boston, MA 02115, USA.
| | | |
Collapse
|
28
|
Binkowski TA, DasGupta B, Liang J. Order independent structural alignment of circularly permuted proteins. CONFERENCE PROCEEDINGS : ... ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL CONFERENCE 2007; 2004:2781-4. [PMID: 17270854 DOI: 10.1109/iembs.2004.1403795] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Circular permutation connects the N and C termini of a protein and concurrently cleaves elsewhere in the chain, providing an important mechanism for generating novel protein fold and functions. However, their in genomes is unknown because current detection methods can miss many occurrences, mistaking random repeats as circular permutation. Here we develop a method for detecting circularly permuted proteins from structural comparison. Sequence order independent alignment of protein structures can be regarded as a special case of the maximum-weight independent set problem, which is known to be computationally hard. We develop an efficient approximation algorithm by repeatedly solving relaxations of an appropriate intermediate integer programming formulation, we show that the approximation ratio is much better than the theoretical worst case ratio of r=1/4. Circularly permuted proteins reported in literature can be identified rapidly with our method, while they escape the detection by publicly available servers for structural alignment.
Collapse
|
29
|
Ausiello G, Peluso D, Via A, Helmer-Citterich M. Local comparison of protein structures highlights cases of convergent evolution in analogous functional sites. BMC Bioinformatics 2007; 8 Suppl 1:S24. [PMID: 17430569 PMCID: PMC1885854 DOI: 10.1186/1471-2105-8-s1-s24] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022] Open
Abstract
Background We performed an exhaustive search for local structural similarities in an ensemble of non-redundant protein functional sites. With the purpose of finding new examples of convergent evolution, we selected only those matching sites composed of structural regions whose residue order is inverted in the relative protein sequences. Results A novel case of local analogy was detected between members of the ABC transporter and of the HprK/P families in their ATP binding site. This case cannot be derived by events of circular permutation since the residues of one of the region pairs are located in reverse order in the sequence of the two protein families. One of the analogous binding sites, the one identified in HprK/P, is known to also bind pyrophosphate, which is used as preferred energy source in its kinase and phosphorylase activity. Conclusion The discovery of this striking molecular similarity, also associated to a functional similarity, may help in suggesting new experiments aimed at a deeper understanding of members of the ABC transporter family known to be involved in many serious human diseases.
Collapse
Affiliation(s)
- Gabriele Ausiello
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Daniele Peluso
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Allegra Via
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| | - Manuela Helmer-Citterich
- Centre for Molecular Bioinformatics, Department of Biology, University of Rome "Tor Vergata", Rome, Italy
| |
Collapse
|
30
|
Viksna J, Gilbert D. Assessment of the probabilities for evolutionary structural changes in protein folds. Bioinformatics 2007; 23:832-41. [PMID: 17282999 DOI: 10.1093/bioinformatics/btm022] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
MOTIVATION The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. RESULTS We have tried to assess the comparative probabilities for a number of known structural changes, and to relate the probabilities of such changes with the distance between protein sequences. We have formalized these structural changes using a topological representation of structures (TOPS), and have developed an algorithm for measuring structural distances that involve few evolutionary steps. The probabilities of structural changes then were estimated on the basis of all-against-all comparisons of the sequence and structure of protein domains from the CATH-95 representative set. The results obtained are reasonably consistent for a number of different data subsets and permit the identification of several 'most popular' types of evolutionary changes in protein structure. The results also suggest that alterations in protein structure are more likely to occur when the sequence similarity is >10% (the average similarity being approximately 6% for the data sets employed in this study), and that the distribution of probabilities of structural changes is fairly uniform within the interval of 15-50% sequence similarity. AVAILABILITY The algorithms have been implemented on the Windows operating system in C++ and using the Borland Visual Component Library. The source code is available on request from the first author. The data sets used for this study (representative sets of protein domains, matrices of sequence similarities and structural distances) are available on http://bioinf.mii.lu.lv/epsrc_project/struct_ev.html.
Collapse
Affiliation(s)
- Juris Viksna
- Institute of Mathematics and Computer Science, University of Latvia, Rainis boulevard 29, Riga LV-1459, Latvia.
| | | |
Collapse
|
31
|
Abstract
BACKGROUND In recent times, there has been an exponential rise in the number of protein structures in databases e.g. PDB. So, design of fast algorithms capable of querying such databases is becoming an increasingly important research issue. This paper reports an algorithm, motivated from spectral graph matching techniques, for retrieving protein structures similar to a query structure from a large protein structure database. Each protein structure is specified by the 3D coordinates of residues of the protein. The algorithm is based on a novel characterization of the residues, called projections, leading to a similarity measure between the residues of the two proteins. This measure is exploited to efficiently compute the optimal equivalences. RESULTS Experimental results show that, the current algorithm outperforms the state of the art on benchmark datasets in terms of speed without losing accuracy. Search results on SCOP 95% nonredundant database, for fold similarity with 5 proteins from different SCOP classes show that the current method performs competitively with the standard algorithm CE. The algorithm is also capable of detecting non-topological similarities between two proteins which is not possible with most of the state of the art tools like Dali.
Collapse
Affiliation(s)
- Sourangshu Bhattacharya
- Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore – 560012, India
| | - Chiranjib Bhattacharyya
- Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore – 560012, India
- Bioinformatics Center, Indian Institute of Science, Bangalore – 560012, India
| | - Nagasuma R Chandra
- Bioinformatics Center, Indian Institute of Science, Bangalore – 560012, India
| |
Collapse
|
32
|
Chen L, Wu LY, Wang Y, Zhang S, Zhang XS. Revealing divergent evolution, identifying circular permutations and detecting active-sites by protein structure comparison. BMC STRUCTURAL BIOLOGY 2006; 6:18. [PMID: 16948858 PMCID: PMC1574323 DOI: 10.1186/1472-6807-6-18] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/20/2006] [Accepted: 09/02/2006] [Indexed: 11/29/2022]
Abstract
Background Protein structure comparison is one of the most important problems in computational biology and plays a key role in protein structure prediction, fold family classification, motif finding, phylogenetic tree reconstruction and protein docking. Results We propose a novel method to compare the protein structures in an accurate and efficient manner. Such a method can be used to not only reveal divergent evolution, but also identify circular permutations and further detect active-sites. Specifically, we define the structure alignment as a multi-objective optimization problem, i.e., maximizing the number of aligned atoms and minimizing their root mean square distance. By controlling a single distance-related parameter, theoretically we can obtain a variety of optimal alignments corresponding to different optimal matching patterns, i.e., from a large matching portion to a small matching portion. The number of variables in our algorithm increases with the number of atoms of protein pairs in almost a linear manner. In addition to solid theoretical background, numerical experiments demonstrated significant improvement of our approach over the existing methods in terms of quality and efficiency. In particular, we show that divergent evolution, circular permutations and active-sites (or structural motifs) can be identified by our method. The software SAMO is available upon request from the authors, or from and . Conclusion A novel formulation is proposed to accurately align protein structures in the framework of multi-objective optimization, based on a sequence order-independent strategy. A fast and accurate algorithm based on the bipartite matching algorithm is developed by exploiting the special features. Convergence of computation is shown in experiments and is also theoretically proven.
Collapse
Affiliation(s)
- Luonan Chen
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
- Osaka Sangyo University, Nakagaito 3-1-1, Daito, Osaka 574-8530, Japan
| | - Ling-Yun Wu
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China
| | - Yong Wang
- Osaka Sangyo University, Nakagaito 3-1-1, Daito, Osaka 574-8530, Japan
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China
| | - Shihua Zhang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China
- Graduate School of the Chinese Academy of Sciences, Beijing 100049,China
| | - Xiang-Sun Zhang
- Institute of Applied Mathematics, Academy of Mathematics and Systems Science, CAS, Beijing 100080, China
| |
Collapse
|
33
|
Abstract
The rearrangement or permutation of protein substructures is an important mode of divergence. Recent work explored one possible underlying mechanism called permutation-by-duplication, which produces special forms of motif rearrangements called circular permutations. Permutation-by-duplication, involving gene duplication, fusion and truncation, can produce fully functional intermediate proteins and thus represents a feasible mechanism of protein evolution. In spite of this, circular permutations are relatively rare and we discuss possible reasons for their existence.
Collapse
Affiliation(s)
- Christine Vogel
- Institute for Cellular and Molecular Biology, University of Texas at Austin, TX, USA.
| | | |
Collapse
|
34
|
Sarma GN, Manning VA, Ciuffetti LM, Karplus PA. Structure of Ptr ToxA: an RGD-containing host-selective toxin from Pyrenophora tritici-repentis. THE PLANT CELL 2005; 17:3190-202. [PMID: 16214901 PMCID: PMC1276037 DOI: 10.1105/tpc.105.034918] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Tan spot of wheat (Triticum aestivum), caused by the fungus Pyrenophora tritici-repentis, has significant agricultural and economic impact. Ptr ToxA (ToxA), the first discovered proteinaceous host-selective toxin, is produced by certain P. tritici-repentis races and is necessary and sufficient to cause cell death in sensitive wheat cultivars. We present here the high-resolution crystal structure of ToxA in two different crystal forms, providing four independent views of the protein. ToxA adopts a single-domain, beta-sandwich fold of novel topology. Mapping of the existing mutation data onto the structure supports the hypothesized importance of an Arg-Gly-Asp (RGD) and surrounding sequence. Its occurrence in a single, solvent-exposed loop in the protein suggests that it is directly involved in recognition events required for ToxA action. Furthermore, the ToxA structure reveals a surprising similarity with the classic mammalian RGD-containing domain, the fibronectin type III (FnIII) domain: the two topologies are related by circular permutation. The similar topologies and the positional conservation of the RGD-containing loop raises the possibility that ToxA is distantly related to mammalian FnIII proteins and that to gain entry it binds to an integrin-like receptor in the plant host.
Collapse
Affiliation(s)
- Ganapathy N Sarma
- Department of Biochemistry and Biophysics, Oregon State University, Corvallis, Oregon 97331, USA
| | | | | | | |
Collapse
|
35
|
Weiner J, Thomas G, Bornberg-Bauer E. Rapid motif-based prediction of circular permutations in multi-domain proteins. Bioinformatics 2005; 21:932-7. [PMID: 15788783 DOI: 10.1093/bioinformatics/bti085] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Rearrangements of protein domains and motifs such as swaps and circular permutations (CPs) can produce erroneous results in searching sequence databases when using traditional methods based on linear sequence alignments. Circular permutations are also of biological relevance because they can help to better understand both protein evolution and functionality. RESULTS We have developed an algorithm, RASPODOM, which is based on the classical recursive alignment scheme. Sequences are represented as strings of domains taken from precompiled resources of domain (motif) databases such as ProDom. The algorithm works several orders of magnitude faster than a reimplementation of the existing CP detection algorithm working on strings of amino acids, produces virtually no false positives and allows the discrimination of true CPs from 'intermediate' CPs (iCPs). Several true CPs which have not been reported in literature so far could be identified from Swiss-Prot/TrEMBL within minutes.
Collapse
Affiliation(s)
- January Weiner
- Division of Bioinformatics, School of Biology, Institute of Botany, The Westphalian Wilhelm's University of Münster Schlossplatz 4 D48149 Münster, Germany
| | | | | |
Collapse
|
36
|
Abstract
Comparison of two protein structures often results in not only a global alignment but also a number of distinct local alignments; the latter, referred to as alternative alignments, are however usually ignored in existing protein structure comparison analyses. Here, we used a novel method of protein structure comparison to extensively identify and characterize the alternative alignments obtained for structure pairs of a fold classification database. We showed that all alternative alignments can be classified into one of just a few types, and with which illustrated the potential of using alternative alignments to identify recurring protein substructures, including the internal structural repeats of a protein. Furthermore, we showed that among the alternative alignments obtained, permuted alignments, which included both circular and scrambled permutations, are as prevalent as topological alignments. These results demonstrated that the so far largely unattended alternative alignments of protein structures have implications and applications for research of protein classification and evolution.
Collapse
Affiliation(s)
- Edward S C Shih
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
37
|
Bulaj G, Koehn RE, Goldenberg DP. Alteration of the disulfide-coupled folding pathway of BPTI by circular permutation. Protein Sci 2004; 13:1182-96. [PMID: 15096625 PMCID: PMC2286756 DOI: 10.1110/ps.03563704] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
The kinetics of disulfide-coupled folding and unfolding of four circularly permuted forms of bovine pancreatic trypsin inhibitor (BPTI) were studied and compared with previously published results for both wild-type BPTI and a cyclized form. Each of the permuted proteins was found to be less stable than either the wild-type or circular proteins, by 3-8 kcal/mole. These stability differences were used to estimate effective concentrations of the chain termini in the native proteins, which were 1 mM for the wild-type protein and 2.5 to 4000 M for the permuted forms. The circular permutations increased the rates of unfolding and caused a variety of effects on the kinetics of refolding. For two of the proteins, the rates of a direct disulfide-formation pathway were dramatically increased, making this process as fast or faster than the competing disulfide rearrangement mechanism that predominates in the folding of the wild-type protein. These two permutations break the covalent connectivity among the beta-strands of the native protein, and removal of these constraints appears to facilitate direct formation and reduction of nearby disulfides that are buried in the folded structure. The effects on folding kinetics and mechanism do not appear to be correlated with relative contact order, a measure of overall topological complexity. These observations are consistent with the results of other recent experimental and computational studies suggesting that circular permutation may generally influence folding mechanisms by favoring or disfavoring specific interactions that promote alternative pathways, rather than through effects on the overall topology of the native protein.
Collapse
Affiliation(s)
- Grzegorz Bulaj
- Department of Biology, University of Utah, Salt Lake City, Utah 84112-0840, USA
| | | | | |
Collapse
|
38
|
Yuan X, Bystroff C. Non-sequential structure-based alignments reveal topology-independent core packing arrangements in proteins. Bioinformatics 2004; 21:1010-9. [PMID: 15531601 DOI: 10.1093/bioinformatics/bti128] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Proteins of the same class often share a secondary structure packing arrangement but differ in how the secondary structure units are ordered in the sequence. We find that proteins that share a common core also share local sequence-structure similarities, and these can be exploited to align structures with different topologies. In this study, segments from a library of local sequence-structure alignments were assembled hierarchically, enforcing the compactness and conserved inter-residue contacts but not sequential ordering. Previous structure-based alignment methods often ignore sequence similarity, local structural equivalence and compactness. RESULTS The new program, SCALI (Structural Core ALIgnment), can efficiently find conserved packing arrangements, even if they are non-sequentially ordered in space. SCALI alignments conserve remote sequence similarity and contain fewer alignment errors. Clustering of our pairwise non-sequential alignments shows that recurrent packing arrangements exist in topologically different structures. For example, the three-layer sandwich domain architecture may be divided into four structural subclasses based on internal packing arrangements. These subclasses represent an intermediate level of structure classification, more general than topology, but more specific than architecture as defined in CATH. A strategy is presented for developing a set of predictive hidden Markov models based on multiple SCALI alignments.
Collapse
Affiliation(s)
- Xin Yuan
- Department of Biology, Rensselaer Polytechnic Institute Troy, NY 12180, USA
| | | |
Collapse
|
39
|
Biarrotte-Sorin S, Maillard AP, Delettré J, Sougakoff W, Arthur M, Mayer C. Crystal structures of Weissella viridescens FemX and its complex with UDP-MurNAc-pentapeptide: insights into FemABX family substrates recognition. Structure 2004; 12:257-67. [PMID: 14962386 DOI: 10.1016/j.str.2004.01.006] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2003] [Revised: 10/28/2003] [Accepted: 10/28/2003] [Indexed: 11/16/2022]
Abstract
Members of the FemABX protein family are novel therapeutic targets, as they are involved in the synthesis of the bacterial cell wall. They catalyze the addition of amino acid(s) on the peptidoglycan precursor using aminoacylated tRNA as a substrate. We report here the high-resolution structure of Weissella viridescens L-alanine transferase FemX and its complex with the UDP-MurNAc-pentapeptide. This is the first structure example of a FemABX family member that does not possess a coiled-coil domain. FemX consists of two structurally equivalent domains, separated by a cleft containing the binding site of the UDP-MurNAc-pentapeptide and a long channel that traverses one of the two domains. Our structural studies bring new insights into the evolution of the FemABX and the related GNAT superfamilies, shed light on the recognition site of the aminoacylated tRNA in Fem proteins, and allowed manual docking of the acceptor end of the alanyl-tRNAAla.
Collapse
Affiliation(s)
- Sabrina Biarrotte-Sorin
- Laboratoire de Minéralogie-Cristallographie de Paris, Université Paris 6, 4 place Jussieu, Paris Cedex 05, 75252, France
| | | | | | | | | | | |
Collapse
|
40
|
Shin DH, Lou Y, Jancarik J, Yokota H, Kim R, Kim SH. Crystal structure of YjeQ from Thermotoga maritima contains a circularly permuted GTPase domain. Proc Natl Acad Sci U S A 2004; 101:13198-203. [PMID: 15331784 PMCID: PMC516547 DOI: 10.1073/pnas.0405202101] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We have determined the crystal structure of the GDP complex of the YjeQ protein from Thermotoga maritima (TmYjeQ), a member of the YjeQ GTPase subfamaily. TmYjeQ, a homologue of Escherichia coli YjeQ, which is known to bind to the ribosome, is composed of three domains: an N-terminal oligonucleotide/oligosaccharide-binding fold domain, a central GTPase domain, and a C-terminal zinc-finger domain. The crystal structure of TmYjeQ reveals two interesting domains: a circularly permutated GTPase domain and an unusual zinc-finger domain. The binding mode of GDP in the GTPase domain of TmYjeQ is similar to those of GDP or GTP analogs in ras proteins, a prototype GTPase. The N-terminal oligonucleotide/oligosaccharide-binding fold domain, together with the GTPase domain, forms the extended RNA-binding site. The C-terminal domain has an unusual zinc-finger motif composed of Cys-250, Cys-255, Cys-263, and His-257, with a remote structural similarity to a portion of a DNA-repair protein, rad51 fragment. The overall structural features of TmYjeQ make it a good candidate for an RNA-binding protein, which is consistent with the biochemical data of the YjeQ subfamily in binding to the ribosome.
Collapse
Affiliation(s)
- Dong Hae Shin
- Berkeley Structural Genomics Center, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA
| | | | | | | | | | | |
Collapse
|
41
|
Hoang C, Ferre-D'Amare AR. Crystal structure of the highly divergent pseudouridine synthase TruD reveals a circular permutation of a conserved fold. RNA (NEW YORK, N.Y.) 2004; 10:1026-1033. [PMID: 15208439 PMCID: PMC1370594 DOI: 10.1261/rna.7240504] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/23/2004] [Accepted: 04/15/2004] [Indexed: 05/24/2023]
Abstract
The pseudouridine (Psi) synthases Pus7p and TruD define a family of RNA-modifying enzymes with no sequence similarity to previously characterized Psi synthases. The 2.2 A resolution structure of Escherichia coli TruD reveals a U-shaped molecule with a catalytic domain that superimposes closely on that of other Psi synthases. A domain that appears to be unique to TruD/Pus7p family enzymes hinges over the catalytic domain, possibly serving to clasp the substrate RNAs. The active site comprises residues that are conserved in other Psi synthases, although at least one comes from a structurally distinct part of the protein. Remarkably, the connectivity of the structural elements of the TruD catalytic domain is a circular permutation of that of its paralogs. Because the sequence of the permuted segment, a beta-strand that bisects the catalytic domain, is conserved among orthologs from bacteria, archaea and eukarya, the permutation likely happened early in evolution.
Collapse
Affiliation(s)
- Charmaine Hoang
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109-1024, USA
| | | |
Collapse
|
42
|
Abendroth J, Rice AE, McLuskey K, Bagdasarian M, Hol WGJ. The crystal structure of the periplasmic domain of the type II secretion system protein EpsM from Vibrio cholerae: the simplest version of the ferredoxin fold. J Mol Biol 2004; 338:585-96. [PMID: 15081815 DOI: 10.1016/j.jmb.2004.01.064] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2003] [Revised: 01/17/2004] [Accepted: 01/23/2004] [Indexed: 10/26/2022]
Abstract
The terminal branch of the general secretion pathway (Gsp or type II secretion system) is used by several pathogenic bacteria for the secretion of their virulence factors across the outer membrane. In these secretion systems, a complex of 12-15 Gsp proteins spans from the pore in the outer membrane via several associated signal or energy-transducing proteins in the inner membrane to a regulating ATPase in the cytosol. The human pathogen Vibrio cholerae uses such a system, called the Eps system, for the export of the cholera toxin and other virulence factors from its periplasm into the lumen of the gastrointestinal tract of the host. Here, we report the atomic structure of the periplasmic domain of the EpsM protein from V.cholerae, which is a part of the interface between the regulating part and the rest of the Eps system. The crystal structure was determined by Se-Met MAD phasing and the model was refined to 1.7A resolution. The monomer consists of two alphabetabeta-subdomains forming a sandwich of two alpha-helices and a four-stranded antiparallel beta-sheet. In the dimer, a deep cleft with a polar rim and a hydrophobic bottom made by conserved residues is located between the monomers. This cleft contains an extra electron density suggesting that this region might serve as a binding site of an unknown ligand or part of a protein partner. Unexpectedly, the fold of the periplasmic domain of EpsM is an undescribed circular permutation of the ferredoxin fold.
Collapse
Affiliation(s)
- Jan Abendroth
- Department of Biochemistry, Biomolecular Structure Center, School of Medicine, University of Washington, Box 357742, Seattle, WA 98195-7242, USA
| | | | | | | | | |
Collapse
|
43
|
Abstract
Two recent reports describe the stunning crystal structures of complexes between a viral protein that suppresses RNA silencing and a 21 nucleotide small interfering (si)RNA.
Collapse
Affiliation(s)
- Eric Westhof
- Institut de biologie moléculaire et cellulaire, Centre National de la Recherche Scientifique, Université Louis Pasteur, 15 rue René Descartes, F-67084 Strasbourg Cedex, France
| |
Collapse
|
44
|
Vargason JM, Szittya G, Burgyán J, Hall TMT. Size selective recognition of siRNA by an RNA silencing suppressor. Cell 2004; 115:799-811. [PMID: 14697199 DOI: 10.1016/s0092-8674(03)00984-x] [Citation(s) in RCA: 397] [Impact Index Per Article: 19.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
Abstract
RNA silencing in plants likely exists as a defense mechanism against molecular parasites such as RNA viruses, retrotransposons, and transgenes. As a result, many plant viruses have adapted mechanisms to evade and suppress gene silencing. Tombusviruses express a 19 kDa protein (p19), which has been shown to suppress RNA silencing in vivo and bind silencing-generated and synthetic small interfering RNAs (siRNAs) in vitro. Here we report the 2.5 A crystal structure of p19 from the Carnation Italian ringspot virus (CIRV) bound to a 21 nt siRNA and demonstrate in biochemical and in vivo assays that CIRV p19 protein acts as a molecular caliper to specifically select siRNAs based on the length of the duplex region of the RNA.
Collapse
Affiliation(s)
- Jeffrey M Vargason
- Laboratory of Structural Biology, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, NC 27709, USA
| | | | | | | |
Collapse
|
45
|
Guasch A, Lucas M, Moncalián G, Cabezas M, Pérez-Luque R, Gomis-Rüth FX, de la Cruz F, Coll M. Recognition and processing of the origin of transfer DNA by conjugative relaxase TrwC. Nat Struct Mol Biol 2003; 10:1002-10. [PMID: 14625590 DOI: 10.1038/nsb1017] [Citation(s) in RCA: 120] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2003] [Accepted: 10/30/2003] [Indexed: 02/03/2023]
Abstract
Relaxases are DNA strand transferases that catalyze the initial and final stages of DNA processing during conjugative cell-to-cell DNA transfer. Upon binding to the origin of transfer (oriT) DNA, relaxase TrwC melts the double helix. The three-dimensional structure of the relaxase domain of TrwC in complex with its cognate DNA at oriT shows a fold built on a two-layer alpha/beta sandwich, with a deep narrow cleft that houses the active site. The DNA includes one arm of an extruded cruciform, an essential feature for specific recognition. This arm is firmly embraced by the protein through a beta-ribbon positioned in the DNA major groove and a loop occupying the minor groove. It is followed by a single-stranded DNA segment that enters the active site, after a sharp U-turn forming a hydrophobic cage that traps the N-terminal methionine. Structural analysis combined with site-directed mutagenesis defines the architecture of the active site.
Collapse
Affiliation(s)
- Alicia Guasch
- Institut de Biologia Molecular de Barcelona, CSIC, Jordi Girona, 18-26, 08034 Barcelona, Spain
| | | | | | | | | | | | | | | |
Collapse
|
46
|
Phlippen N, Hoffmann K, Fischer R, Wolf K, Zimmermann M. The glutathione synthetase of Schizosaccharomyces pombe is synthesized as a homodimer but retains full activity when present as a heterotetramer. J Biol Chem 2003; 278:40152-61. [PMID: 12734194 DOI: 10.1074/jbc.m303102200] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Glutathione synthetase was overexpressed as a histidine-tagged protein in Schizosaccharomyces pombe and purified by two-step affinity chromatography. The recovered enzyme occurred in two different forms: a homodimeric protein consisting of two identical 56-kDa subunits and a heterotetrameric protein composed of two 32-kDa and two 24-kDa subfragments. Both forms are encoded by the GSH2 gene. The 56-Da protein corresponds to the complete GSH2 open reading frame, while the subfragments are produced following the cleavage of this larger protein by a metalloprotease. A stable homodimer was obtained by site-directed mutagenesis to remove the protease cleavage site, and this showed normal activity. A structural model of the fission yeast glutathione synthetase was produced, based on the x-ray coordinates of the human enzyme. According to this model the interacting domains of the proteolytic subfragments are strongly entangled. The subfragments were therefore coexpressed as independent proteins. These subfragments assembled correctly to yield functional heterotetramers with equivalent activity to the wild type enzyme. Furthermore, a permuted version of the protein was created. This also showed normal levels of glutathione synthetase activity. These data provide novel insight into the mechanisms of protein folding and the structure and evolution of the glutathione synthetase family.
Collapse
Affiliation(s)
- Nadine Phlippen
- Institute of Biology IV (Microbiology and Genetics), Aachen University, Worringer Weg, D-52056 Aachen, Germany
| | | | | | | | | |
Collapse
|
47
|
Söding J, Lupas AN. More than the sum of their parts: on the evolution of proteins from peptides. Bioessays 2003; 25:837-46. [PMID: 12938173 DOI: 10.1002/bies.10321] [Citation(s) in RCA: 178] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Despite their seemingly endless diversity, proteins adopt a limited number of structural forms. It has been estimated that 80% of proteins will be found to adopt one of only about 400 folds, most of which are already known. These folds are largely formed by a limited 'vocabulary' of recurring supersecondary structure elements, often by repetition of the same element and, increasingly, elements similar in both structure and sequence are discovered. This suggests that modern proteins evolved by fusion and recombination from a more ancient peptide world and that many of the core folds observed today may contain homologous building blocks. The peptides forming these building blocks would not in themselves have had the ability to fold, but would have emerged as cofactors supporting RNA-based replication and catalysis (the 'RNA world'). Their association into larger structures and eventual fusion into polypeptide chains would have allowed them to become independent of their RNA scaffold, leading to the evolution of a novel type of macromolecule: the folded protein.
Collapse
Affiliation(s)
- Johannes Söding
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology, Tübingen, Germany
| | | |
Collapse
|
48
|
Tsai LC, Shyur LF, Lee SH, Lin SS, Yuan HS. Crystal structure of a natural circularly permuted jellyroll protein: 1,3-1,4-beta-D-glucanase from Fibrobacter succinogenes. J Mol Biol 2003; 330:607-20. [PMID: 12842475 DOI: 10.1016/s0022-2836(03)00630-2] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
The 1,3-1,4-beta-D-glucanase from Fibrobacter succinogenes (Fsbeta-glucanase) is classified as one of the family 16 glycosyl hydrolases. It hydrolyzes the glycosidic bond in the mixed-linked glucans containing beta-1,3- and beta-1,4-glycosidic linkages. We constructed a truncated form of recombinant Fsbeta-glucanase containing the catalytic domain from amino acid residues 1-258, which exhibited a higher thermal stability and enzymatic activity than the full-length enzyme. The crystal structure of the truncated Fsbeta-glucanase was solved at a resolution of 1.7A by the multiple wavelength anomalous dispersion (MAD) method using the anomalous signals from the seleno-methionine-labeled protein. The overall topology of the truncated Fsbeta-glucanase consists mainly of two eight-stranded anti-parallel beta-sheets arranged in a jellyroll beta-sandwich, similar to the fold of many glycosyl hydrolases and carbohydrate-binding modules. Sequence comparison with other bacterial glucanases showed that Fsbeta-glucanase is the only naturally occurring circularly permuted beta-glucanase with reversed sequences. Structural comparison shows that the engineered circular-permuted Bacillus enzymes are more similar to their parent enzymes with which they share approximately 70% sequence identity, than to the naturally occurring Fsbeta-glucanase of similar topology with 30% identity. This result suggests that protein structure relies more on sequence identity than topology. The high-resolution structure of Fsbeta-glucanase provides a structural rationale for the different activities obtained from a series of mutant glucanases and a basis for the development of engineered enzymes with increased activity and structural stability.
Collapse
Affiliation(s)
- Li-Chu Tsai
- Institute of Molecular Biology, Academia Sinica, Taipei, Taiwan, ROC
| | | | | | | | | |
Collapse
|
49
|
Gorbalenya AE, Pringle FM, Zeddam JL, Luke BT, Cameron CE, Kalmakoff J, Hanzlik TN, Gordon KHJ, Ward VK. The palm subdomain-based active site is internally permuted in viral RNA-dependent RNA polymerases of an ancient lineage. J Mol Biol 2002; 324:47-62. [PMID: 12421558 PMCID: PMC7127740 DOI: 10.1016/s0022-2836(02)01033-1] [Citation(s) in RCA: 188] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Template-dependent polynucleotide synthesis is catalyzed by enzymes whose core component includes a ubiquitous alphabeta palm subdomain comprising A, B and C sequence motifs crucial for catalysis. Due to its unique, universal conservation in all RNA viruses, the palm subdomain of RNA-dependent RNA polymerases (RdRps) is widely used for evolutionary and taxonomic inferences. We report here the results of elaborated computer-assisted analysis of newly sequenced replicases from Thosea asigna virus (TaV) and the closely related Euprosterna elaeasa virus (EeV), insect-specific ssRNA+ viruses, which revise a capsid-based classification of these viruses with tetraviruses, an Alphavirus-like family. The replicases of TaV and EeV do not have characteristic methyltransferase and helicase domains, and include a putative RdRp with a unique C-A-B motif arrangement in the palm subdomain that is also found in two dsRNA birnaviruses. This circular motif rearrangement is a result of migration of approximately 22 amino acid (aa) residues encompassing motif C between two internal positions, separated by approximately 110 aa, in a conserved region of approximately 550 aa. Protein modeling shows that the canonical palm subdomain architecture of poliovirus (ssRNA+) RdRp could accommodate the identified sequence permutation through changes in backbone connectivity of the major structural elements in three loop regions underlying the active site. This permutation transforms the ferredoxin-like beta1alphaAbeta2beta3alphaBbeta4 fold of the palm subdomain into the beta2beta3beta1alphaAalphaBbeta4 structure and brings beta-strands carrying two principal catalytic Asp residues into sequential proximity such that unique structural properties and, ultimately, unique functionality of the permuted RdRps may result. The permuted enzymes show unprecedented interclass sequence conservation between RdRps of true ssRNA+ and dsRNA viruses and form a minor, deeply separated cluster in the RdRp tree, implying that other, as yet unidentified, viruses may employ this type of RdRp. The structural diversification of the palm subdomain might be a major event in the evolution of template-dependent polynucleotide polymerases in the RNA-protein world.
Collapse
Key Words
- rna viruses
- rna polymerases
- evolution
- protein permutation
- ancient palm subdomain
- aa, amino acid
- cd, conserved domain
- eev, euprosterna elaeasa virus
- ibdv, infectious bursal disease virus
- ipnvj, infectious pancreatic necrosis virus strain jasper
- pv, poliovirus
- tav, thosea asigna virus
- dsrna, double-stranded rna
- ssrna+, positive-stranded rna
- rdrp, rna-dependent rna polymerase
- hmm, hidden markov model
- orf, open reading frames
- nt, nucleotide
- tdpp, template-dependent polynucleotide polymerase
Collapse
Affiliation(s)
- Alexander E Gorbalenya
- Advanced Biomedical Computing Center, Science Applications International Corporation/National Cancer Institute, P.O. Box B, Frederick, MD 21702-1201, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
50
|
Abstract
Linguistic metaphors have been woven into the fabric of molecular biology since its inception. The determination of the human genome sequence has brought these metaphors to the forefront of the popular imagination, with the natural extension of the notion of DNA as language to that of the genome as the 'book of life'. But do these analogies go deeper and, if so, can the methods developed for analysing languages be applied to molecular biology? In fact, many techniques used in bioinformatics, even if developed independently, may be seen to be grounded in linguistics. Further interweaving of these fields will be instrumental in extending our understanding of the language of life.
Collapse
Affiliation(s)
- David B Searls
- Bioinformatics Division, Genetics Research, GlaxoSmithKline Pharmaceuticals, King of Prussia, PA 19406, USA.
| |
Collapse
|