51
|
Developing a high-quality scoring function for membrane protein structures based on specific inter-residue interactions. J Comput Aided Mol Des 2012; 26:301-9. [PMID: 22395902 DOI: 10.1007/s10822-012-9556-z] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2011] [Accepted: 02/19/2012] [Indexed: 10/28/2022]
Abstract
Membrane proteins are of particular biological and pharmaceutical importance, and computational modeling and structure prediction approaches play an important role in studies of membrane proteins. Developing an accurate model quality assessment program is of significance to the structure prediction of membrane proteins. Few such programs are proposed that can be applied to a broad range of membrane protein classes and perform with high accuracy. We developed a new model scoring function Interaction-based Quality assessment (IQ), based on the analysis of four types of inter-residue interactions within the transmembrane domains of helical membrane proteins. This function was tested using three high-quality model sets: all 206 models of GPCR Dock 2008, all 284 models of GPCR Dock 2010, and all 92 helical membrane protein models of the HOMEP set. For all three sets, the scoring function can select the native structures among all of the models with the success rates of 93, 85, and 100% respectively. For comparison, these three model sets were also adopted for a recently published model assessment program for membrane protein structures, ProQM, which gave the success rates of 85, 79, and 92% separately. These results suggested that IQ outperforms ProQM when only the transmembrane regions of the models are considered. This scoring function should be useful for the computational modeling of membrane proteins.
Collapse
|
52
|
Daniels NM, Kumar A, Cowen LJ, Menke M. Touring protein space with Matt. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:286-93. [PMID: 21464511 PMCID: PMC3355523 DOI: 10.1109/tcbb.2011.70] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Using the Matt structure alignment program, we take a tour of protein space, producing a hierarchical clustering scheme that divides protein structural domains into clusters based on geometric dissimilarity. While it was known that purely structural, geometric, distance-based measures of structural similarity, such as Dali/FSSP, could largely replicate hand-curated schemes such as SCOP at the family level, it was an open question as to whether any such scheme could approximate SCOP at the more distant superfamily and fold levels. We partially answer this question in the affirmative, by designing a clustering scheme based on Matt that approximately matches SCOP at the superfamily level, and demonstrates qualitative differences in performance between Matt and DaliLite. Implications for the debate over the organization of protein fold space are discussed. Based on our clustering of protein space, we introduce the Mattbench benchmark set, a new collection of structural alignments useful for testing sequence aligners on more distantly homologous proteins.
Collapse
Affiliation(s)
- Noah M. Daniels
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Anoop Kumar
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Lenore J. Cowen
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| | - Matt Menke
- The authors are with the Tufts University, 161 College Avenue, Halligan Hall Room 102, Medford, MA 02155
| |
Collapse
|
53
|
The evolution of protein structures and structural ensembles under functional constraint. Genes (Basel) 2011; 2:748-62. [PMID: 24710290 PMCID: PMC3927589 DOI: 10.3390/genes2040748] [Citation(s) in RCA: 44] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2011] [Revised: 10/15/2011] [Accepted: 10/19/2011] [Indexed: 02/06/2023] Open
Abstract
Protein sequence, structure, and function are inherently linked through evolution and population genetics. Our knowledge of protein structure comes from solved structures in the Protein Data Bank (PDB), our knowledge of sequence through sequences found in the NCBI sequence databases (http://www.ncbi.nlm.nih.gov/), and our knowledge of function through a limited set of in-vitro biochemical studies. How these intersect through evolution is described in the first part of the review. In the second part, our understanding of a series of questions is addressed. This includes how sequences evolve within structures, how evolutionary processes enable structural transitions, how the folding process can change through evolution and what the fitness impacts of this might be. Moving beyond static structures, the evolution of protein kinetics (including normal modes) is discussed, as is the evolution of conformational ensembles and structurally disordered proteins. This ties back to a question of the role of neostructuralization and how it relates to selection on sequences for functions. The relationship between metastability, the fitness landscape, sequence divergence, and organismal effective population size is explored. Lastly, a brief discussion of modeling the evolution of sequences of ordered and disordered proteins is entertained.
Collapse
|
54
|
O'Donnell CW, Waldispühl J, Lis M, Halfmann R, Devadas S, Lindquist S, Berger B. A method for probing the mutational landscape of amyloid structure. ACTA ACUST UNITED AC 2011; 27:i34-42. [PMID: 21685090 PMCID: PMC3117379 DOI: 10.1093/bioinformatics/btr238] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Motivation: Proteins of all kinds can self-assemble into highly ordered β-sheet aggregates known as amyloid fibrils, important both biologically and clinically. However, the specific molecular structure of a fibril can vary dramatically depending on sequence and environmental conditions, and mutations can drastically alter amyloid function and pathogenicity. Experimental structure determination has proven extremely difficult with only a handful of NMR-based models proposed, suggesting a need for computational methods. Results: We present AmyloidMutants, a statistical mechanics approach for de novo prediction and analysis of wild-type and mutant amyloid structures. Based on the premise of protein mutational landscapes, AmyloidMutants energetically quantifies the effects of sequence mutation on fibril conformation and stability. Tested on non-mutant, full-length amyloid structures with known chemical shift data, AmyloidMutants offers roughly 2-fold improvement in prediction accuracy over existing tools. Moreover, AmyloidMutants is the only method to predict complete super-secondary structures, enabling accurate discrimination of topologically dissimilar amyloid conformations that correspond to the same sequence locations. Applied to mutant prediction, AmyloidMutants identifies a global conformational switch between Aβ and its highly-toxic ‘Iowa’ mutant in agreement with a recent experimental model based on partial chemical shift data. Predictions on mutant, yeast-toxic strains of HET-s suggest similar alternate folds. When applied to HET-s and a HET-s mutant with core asparagines replaced by glutamines (both highly amyloidogenic chemically similar residues abundant in many amyloids), AmyloidMutants surprisingly predicts a greatly reduced capacity of the glutamine mutant to form amyloid. We confirm this finding by conducting mutagenesis experiments. Availability: Our tool is publically available on the web at http://amyloid.csail.mit.edu/. Contact:lindquist_admin@wi.mit.edu; bab@csail.mit.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Charles W O'Donnell
- Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA
| | | | | | | | | | | | | |
Collapse
|
55
|
Kuppuraj G, Sargsyan K, Hua YH, Merrill AR, Lim C. Linking distinct conformations of nicotinamide adenine dinucleotide with protein fold/function. J Phys Chem B 2011; 115:7932-9. [PMID: 21612228 DOI: 10.1021/jp1118663] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Nicotinamide adenine dinucleotide (NAD or NADP) are essential cofactor/substrate for enzymes that catalyze redox or nonredox reactions. Because several enzymes involved in NAD(P) metabolism have been implicated in a wide array of diseases, there is great interest in designing inhibitors/activators of these NAD(P)-dependent enzymes based on their structures. Hence, we have elucidated the various distinct enzyme-bound NAD(P) conformations and their correlation with the respective protein fold and function using hierarchical clustering methods. Torsion angles distinguishing enzyme-bound NAD versus NADP conformations and NAD(P) conformations bound to redox versus nonredox enzymes were identified. Although an unusually small χ(N) in diphtheria toxin-bound NAD(+) had been postulated to strain the N-glycosidic bond, thus facilitating catalysis, toxin-bound NAD(+) molecules with χ(N) varying from 0 to 60° were found to exhibit similar C(1D)-N(1N) bond cleavage barriers in water. The findings herein provide useful guidelines in the design of inhibitors/activators of NAD(P)-dependent enzymes that are therapeutic targets.
Collapse
Affiliation(s)
- Gopi Kuppuraj
- Chemical Biology & Molecular Biophysics, Taiwan International Graduate Program, Academia Sinica, Taipei 115, Taiwan
| | | | | | | | | |
Collapse
|
56
|
Hu Y, Zhang X, Shi Y, Zhou Y, Zhang W, Su XD, Xia B, Zhao J, Jin C. Structures of Anabaena calcium-binding protein CcbP: insights into Ca2+ signaling during heterocyst differentiation. J Biol Chem 2011; 286:12381-8. [PMID: 21330362 DOI: 10.1074/jbc.m110.201186] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
Ca2+-binding proteins play pivotal roles in both eukaryotic and prokaryotic cells. CcbP from cyanobacterium Anabaena sp. strain PCC 7120 is a major Ca2+-binding protein involved in heterocyst differentiation, a process that forms specialized nitrogen-fixing cells. The three-dimensional structures of both Ca2+-free and Ca2+-bound forms of CcbP are essential for elucidating the Ca2+-signaling mechanism. However, CcbP shares low sequence identity with proteins of known structures, and its Ca2+-binding sites remain unknown. Here, we report the solution structures of CcbP in both Ca2+-free and Ca2+-bound forms determined by nuclear magnetic resonance spectroscopy. CcbP adopts an overall new fold and contains two Ca2+-binding sites with distinct Ca2+-binding abilities. Mutation of Asp38 at the stronger Ca2+-binding site of CcbP abolished its ability to regulate heterocyst formation in vivo. Surprisingly, the β-barrel subdomain of CcbP, which does not participate in Ca2+-binding, topologically resembles the Src homology 3 (SH3) domain and might act as a protein-protein interaction module. Our results provide the structural basis of the unique Ca2+ signaling mechanism during heterocyst differentiation.
Collapse
Affiliation(s)
- Yunfei Hu
- Beijing Nuclear Magnetic Resonance Center, State Key Laboratory of Plant and Protein Engineering, College of Life Sciences, Peking University, Beijing, China
| | | | | | | | | | | | | | | | | |
Collapse
|
57
|
Pham Y, Kuhlman B, Butterfoss GL, Hu H, Weinreb V, Carter CW. Tryptophanyl-tRNA synthetase Urzyme: a model to recapitulate molecular evolution and investigate intramolecular complementation. J Biol Chem 2010; 285:38590-601. [PMID: 20864539 PMCID: PMC2992291 DOI: 10.1074/jbc.m110.136911] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2010] [Revised: 09/14/2010] [Indexed: 01/26/2023] Open
Abstract
We substantiate our preliminary description of the class I tryptophanyl-tRNA synthetase minimal catalytic domain with details of its construction, structure, and steady-state kinetic parameters. Generating that active fragment involved deleting 65% of the contemporary enzyme, including the anticodon-binding domain and connecting peptide 1, CP1, a 74-residue internal segment from within the Rossmann fold. We used protein design (Rosetta), rather than phylogenetic sequence alignments, to identify mutations to compensate for the severe loss of modularity, thus restoring stability, as evidenced by renaturation described previously and by 70-ns molecular dynamics simulations. Sufficient solubility to enable biochemical studies was achieved by expressing the redesigned Urzyme as a maltose-binding protein fusion. Michaelis-Menten kinetic parameters from amino acid activation assays showed that, compared with the native full-length enzyme, TrpRS Urzyme binds ATP with similar affinity. This suggests that neither of the two deleted structural modules has a strong influence on ground-state ATP binding. However, tryptophan has 10(3) lower affinity, and the Urzyme has comparably reduced specificity relative to the related amino acid, tyrosine. Molecular dynamics simulations revealed how CP1 may contribute significantly to cognate amino acid specificity. As class Ia editing domains are nested within the CP1, this finding suggests that this module enhanced amino acid specificity continuously, throughout their evolution. We call this type of reconstructed protein catalyst an Urzyme (Ur prefix indicates original, primitive, or earliest). It establishes a model for recapitulating very early steps in molecular evolution in which fitness may have been enhanced by accumulating entire modules, rather than by discrete amino acid sequence changes.
Collapse
Affiliation(s)
- Yen Pham
- From the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Brian Kuhlman
- From the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Glenn L. Butterfoss
- the Biology and Courant Computer Science Department, New York University, New York, New York 10003, and
| | - Hao Hu
- the Chong Yuet Ming Chemistry Building, University of Hong Kong, Pokfulam Road, 999077 Hong Kong, China
| | - Violetta Weinreb
- From the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| | - Charles W. Carter
- From the Department of Biochemistry and Biophysics, School of Medicine, University of North Carolina, Chapel Hill, North Carolina 27599
| |
Collapse
|
58
|
Vo A, Nguyen N, Huang H. Solenoid and non-solenoid protein recognition using stationary wavelet packet transform. Bioinformatics 2010; 26:i467-73. [PMID: 20823309 PMCID: PMC2935422 DOI: 10.1093/bioinformatics/btq371] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Motivation: Solenoid proteins are emerging as a protein class with properties intermediate between structured and intrinsically unstructured proteins. Containing repeating structural units, solenoid proteins are expected to share sequence similarities. However, in many cases, the sequence similarities are weak and non-detectable. Moreover, solenoids can be degenerated and widely vary in the number of units. So that it is difficult to detect them. Recently, several solenoid repeats detection methods have been proposed, such as self-alignment of the sequence, spectral analysis and discrete Fourier transform of sequence. Although these methods have shown good performance on certain data sets, they often fail to detect repeats with weak similarities. In this article, we propose a new approach to recognize solenoid repeats and non-solenoid proteins using stationary wavelet packet transform (SWPT). Our method associates with three advantages: (i) naturally representing five main factors of protein structure and properties by wavelet analysis technique; (ii) extracting novel wavelet features that can capture hidden components from solenoid sequence similarities and distinguish them from global proteins; (iii) obtaining statistics features that capture repeating motifs of solenoid proteins. Results: Our method analyzes the characteristics of amino acid sequence in both spectral and temporal domains using SWPT. Both global and local information of proteins are captured by SWPT coefficients. We obtain and integrate wavelet-based features and statistics-based features of amino acid sequence to improve the classification task. Our proposed method is evaluated by comparing to state-of-the-art methods such as HHrepID and REPETITA. The experimental results show that our algorithm consistently outperforms them in areas under ROC curve. At the same false positive rate, the sensitivity of our WAVELET method is higher than other methods. Availability:http://www.naaan.org/anvo/Software/Software.htm Contact:anphuocnhu.vo@mavs.uta.edu
Collapse
Affiliation(s)
- An Vo
- The Feinstein Institute for Medical Research, North Shore LIJ Health System, NY, USA.
| | | | | |
Collapse
|
59
|
Weekes D, Krishna SS, Bakolitsa C, Wilson IA, Godzik A, Wooley J. TOPSAN: a collaborative annotation environment for structural genomics. BMC Bioinformatics 2010; 11:426. [PMID: 20716366 PMCID: PMC2936398 DOI: 10.1186/1471-2105-11-426] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2010] [Accepted: 08/17/2010] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Many protein structures determined in high-throughput structural genomics centers, despite their significant novelty and importance, are available only as PDB depositions and are not accompanied by a peer-reviewed manuscript. Because of this they are not accessible by the standard tools of literature searches, remaining underutilized by the broad biological community. RESULTS To address this issue we have developed TOPSAN, The Open Protein Structure Annotation Network, a web-based platform that combines the openness of the wiki model with the quality control of scientific communication. TOPSAN enables research collaborations and scientific dialogue among globally distributed participants, the results of which are reviewed by experts and eventually validated by peer review. The immediate goal of TOPSAN is to harness the combined experience, knowledge, and data from such collaborations in order to enhance the impact of the astonishing number and diversity of structures being determined by structural genomics centers and high-throughput structural biology. CONCLUSIONS TOPSAN combines features of automated annotation databases and formal, peer-reviewed scientific research literature, providing an ideal vehicle to bridge a gap between rapidly accumulating data from high-throughput technologies and a much slower pace for its analysis and integration with other, relevant research.
Collapse
Affiliation(s)
- Dana Weekes
- Joint Center for Structural Genomics, Bioinformatics Core, Sanford-Burnham Medical Research Institute, La Jolla, CA 92037, USA
| | | | | | | | | | | |
Collapse
|
60
|
Huseby MJ, Kruse AC, Digre J, Kohler PL, Vocke JA, Mann EE, Bayles KW, Bohach GA, Schlievert PM, Ohlendorf DH, Earhart CA. Beta toxin catalyzes formation of nucleoprotein matrix in staphylococcal biofilms. Proc Natl Acad Sci U S A 2010; 107:14407-12. [PMID: 20660751 PMCID: PMC2922554 DOI: 10.1073/pnas.0911032107] [Citation(s) in RCA: 141] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Biofilms are surface-associated communities of microbes encompassed by an extracellular matrix. It is estimated that 80% of all bacterial infections involve biofilm formation, but the structure and regulation of biofilms are incompletely understood. Extracellular DNA (eDNA) is a major structural component in many biofilms of the pathogenic bacterium Staphylococcus aureus, but its role is enigmatic. Here, we demonstrate that beta toxin, a neutral sphingomyelinase and a virulence factor of S. aureus, forms covalent cross-links to itself in the presence of DNA (we refer to this as biofilm ligase activity, independent of sphingomyelinase activity) producing an insoluble nucleoprotein matrix in vitro. Furthermore, we show that beta toxin strongly stimulates biofilm formation in vivo as demonstrated by a role in causation of infectious endocarditis in a rabbit model. Together, these results suggest that beta toxin cross-linking in the presence of eDNA assists in forming the skeletal framework upon which staphylococcal biofilms are established.
Collapse
Affiliation(s)
- Medora J Huseby
- Department of Biochemistry, Molecular Biology, and Biophysics, University of Minnesota, Minneapolis, MN 55455, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Khaliullin IG, Suplatov DA, Shalaeva DN, Otsuka M, Asano Y, Svedas VK. Bioinformatic Analysis, Molecular Modeling of Role of Lys65 Residue in Catalytic Triad of D-aminopeptidase from Ochrobactrum anthropi. Acta Naturae 2010; 2:66-71. [PMID: 22649642 PMCID: PMC3347556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
A bioinformatic and phylogenetic study has been performed on a family of penicillin-binding proteins including D-aminopeptidases, D-amino acid amidases, DD-carboxypeptidases, and β -lactamases. Significant homology between D-aminopeptidase from Ochrobactrum anthropi and other members of the family has been shown and a number of conserved residues identified as S62, K65, Y153, N155, H287, and G289. Three of those (Ser62, Lys65, and Tyr153) form a catalytic triangle - the proton relay system that activates the generalized nucleophile in the course of catalysis. Molecular modeling has indicated the conserved residue Lys65 to have an unusually low pKa value, which has been confirmed experimentally by a study of the pH-profile of D-aminopeptidase catalytic activity. The resulting data have been used to elucidate the role of Lys65 in the catalytic mechanism of D-aminopeptidase as a general base for proton transfer from catalytic Ser62 to Tyr153, and vice versa, during the formation and hydrolysis of the acyl - enzyme intermediate.
Collapse
Affiliation(s)
- I G Khaliullin
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University
| | | | | | | | | | | |
Collapse
|
62
|
Wu CY, Chen YC, Lim C. A structural-alphabet-based strategy for finding structural motifs across protein families. Nucleic Acids Res 2010; 38:e150. [PMID: 20525797 PMCID: PMC2919736 DOI: 10.1093/nar/gkq478] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Proteins with insignificant sequence and overall structure similarity may still share locally conserved contiguous structural segments; i.e. structural/3D motifs. Most methods for finding 3D motifs require a known motif to search for other similar structures or functionally/structurally crucial residues. Here, without requiring a query motif or essential residues, a fully automated method for discovering 3D motifs of various sizes across protein families with different folds based on a 16-letter structural alphabet is presented. It was applied to structurally non-redundant proteins bound to DNA, RNA, obligate/non-obligate proteins as well as free DNA-binding proteins (DBPs) and proteins with known structures but unknown function. Its usefulness was illustrated by analyzing the 3D motifs found in DBPs. A non-specific motif was found with a ‘corner’ architecture that confers a stable scaffold and enables diverse interactions, making it suitable for binding not only DNA but also RNA and proteins. Furthermore, DNA-specific motifs present ‘only’ in DBPs were discovered. The motifs found can provide useful guidelines in detecting binding sites and computational protein redesign.
Collapse
Affiliation(s)
- Chih Yuan Wu
- Department of Chemistry, National Tsing Hua University, Hsinchu, Taiwan
| | | | | |
Collapse
|
63
|
de Miranda JR, Dainat B, Locke B, Cordoni G, Berthoud H, Gauthier L, Neumann P, Budge GE, Ball BV, Stoltz DB. Genetic characterization of slow bee paralysis virus of the honeybee (Apis mellifera L.). J Gen Virol 2010; 91:2524-30. [PMID: 20519455 DOI: 10.1099/vir.0.022434-0] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Complete genome sequences were determined for two distinct strains of slow bee paralysis virus (SBPV) of honeybees (Apis mellifera). The SBPV genome is approximately 9.5 kb long and contains a single ORF flanked by 5'- and 3'-UTRs and a naturally polyadenylated 3' tail, with a genome organization typical of members of the family Iflaviridae. The two strains, labelled 'Rothamsted' and 'Harpenden', are 83% identical at the nucleotide level (94% identical at the amino acid level), although this variation is distributed unevenly over the genome. The two strains were found to co-exist at different proportions in two independently propagated SBPV preparations. The natural prevalence of SBPV for 847 colonies in 162 apiaries across five European countries was <2%, with positive samples found only in England and Switzerland, in colonies with variable degrees of Varroa infestation.
Collapse
Affiliation(s)
- Joachim R de Miranda
- School of Biological Sciences, Queen's University Belfast, Belfast BT9 7BL, Republic of Ireland.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
64
|
Shah AA, Folino G, Krasnogor N. Toward High-Throughput, Multicriteria Protein-Structure Comparison and Analysis. IEEE Trans Nanobioscience 2010; 9:144-55. [DOI: 10.1109/tnb.2010.2043851] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
|
65
|
Kavathekar PA, Craig BA, Friedman AM, Bailey-Kellogg C, Balkcom DJ. Characterizing the space of interatomic distance distribution functions consistent with solution scattering data. J Bioinform Comput Biol 2010; 8:315-35. [PMID: 20401948 DOI: 10.1142/s0219720010004781] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2009] [Revised: 11/01/2009] [Accepted: 11/01/2009] [Indexed: 11/18/2022]
Abstract
Scattering of neutrons and X-rays from molecules in solution offers alternative approaches to the study of a wide range of macromolecular structures in their solution state without crystallization. We study one part of the problem of elucidating three-dimensional structure from solution scattering data, determining the distribution of interatomic distances, P(r), where r is the distance between two atoms in the protein molecule. This problem is known to be ill-conditioned: for a single observed diffraction pattern, there may be many consistent distance distribution functions, and there is a risk of overfitting the observed scattering data. We propose a new approach to avoiding this problem: accepting the validity of multiple alternative P(r) curves rather than seeking a single "best." We place linear constraints to ensure that a computed P(r) is consistent with the experimental data. The constraints enforce smoothness in the P(r) curve, ensure that the P(r) curve is a probability distribution, and allow for experimental error. We use these constraints to precisely describe the space of all consistent P(r) curves as a polytope of histogram values or Fourier coefficients. We develop a linear programming approach to sampling the space of consistent, realistic P(r) curves. On both experimental and simulated scattering data, our approach efficiently generates ensembles of such curves that display substantial diversity.
Collapse
|
66
|
Batyanovskii AV, Esipova NG, Shnoll SE. Mutual disposition of short conformationally stanch oligopeptides in the 3D structure of globular proteins. Biophysics (Nagoya-shi) 2010. [DOI: 10.1134/s0006350909060153] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
67
|
FragBag, an accurate representation of protein structure, retrieves structural neighbors from the entire PDB quickly and accurately. Proc Natl Acad Sci U S A 2010; 107:3481-6. [PMID: 20133727 DOI: 10.1073/pnas.0914097107] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Fast identification of protein structures that are similar to a specified query structure in the entire Protein Data Bank (PDB) is fundamental in structure and function prediction. We present FragBag: An ultrafast and accurate method for comparing protein structures. We describe a protein structure by the collection of its overlapping short contiguous backbone segments, and discretize this set using a library of fragments. Then, we succinctly represent the protein as a "bags-of-fragments"-a vector that counts the number of occurrences of each fragment-and measure the similarity between two structures by the similarity between their vectors. Our representation has two additional benefits: (i) it can be used to construct an inverted index, for implementing a fast structural search engine of the entire PDB, and (ii) one can specify a structure as a collection of substructures, without combining them into a single structure; this is valuable for structure prediction, when there are reliable predictions only of parts of the protein. We use receiver operating characteristic curve analysis to quantify the success of FragBag in identifying neighbor candidate sets in a dataset of over 2,900 structures. The gold standard is the set of neighbors found by six state of the art structural aligners. Our best FragBag library finds more accurate candidate sets than the three other filter methods: The SGM, PRIDE, and a method by Zotenko et al. More interestingly, FragBag performs on a par with the computationally expensive, yet highly trusted structural aligners STRUCTAL and CE.
Collapse
|
68
|
Adding structural information to the von Hippel-Lindau (VHL) tumor suppressor interaction network. FEBS Lett 2009; 583:3704-10. [PMID: 19878677 DOI: 10.1016/j.febslet.2009.10.070] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2009] [Revised: 10/21/2009] [Accepted: 10/26/2009] [Indexed: 11/23/2022]
Abstract
The von Hippel-Lindau (VHL) tumor suppressor gene is a protein interaction hub, controlling numerous genes implicated in tumor progression. Here we focus on structural aspects of protein interactions for a list of 35 experimentally verified protein VHL (pVHL) interactors. Using structural information and computational analysis we have located three distinct interaction interfaces (A, B, and C). Interface B is the most versatile, recognizing a refined linear motif present in 17 otherwise non-related proteins. It has been possible to distinguish compatible and exclusive interactions by relating pVHL function to interaction interfaces and subcellular localization. A novel hypothesis is presented regarding the possible function of the N-terminus as an inhibitor of pVHL function.
Collapse
|
69
|
Structural relationships among proteins with different global topologies and their implications for function annotation strategies. Proc Natl Acad Sci U S A 2009; 106:17377-82. [PMID: 19805138 DOI: 10.1073/pnas.0907971106] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
It has become increasingly apparent that geometric relationships often exist between regions of two proteins that have quite different global topologies or folds. In this article, we examine whether such relationships can be used to infer a functional connection between the two proteins in question. We find, by considering a number of examples involving metal and cation binding, sugar binding, and aromatic group binding, that geometrically similar protein fragments can share related functions, even if they have been classified as belonging to different folds and topologies. Thus, the use of classifications inevitably limits the number of functional inferences that can be obtained from the comparative analysis of protein structures. In contrast, the development of interactive computational tools that recognize the "continuous" nature of protein structure/function space, by increasing the number of potentially meaningful relationships that are considered, may offer a dramatic enhancement in the ability to extract information from protein structure databases. We introduce the MarkUs server, that embodies this strategy and that is designed for a user interested in developing and validating specific functional hypotheses.
Collapse
|
70
|
Abstract
We present FIGfams, a new collection of over 100 000 protein families that are the product of manual curation and close strain comparison. Using the Subsystem approach the manual curation is carried out, ensuring a previously unattained degree of throughput and consistency. FIGfams are based on over 950 000 manually annotated proteins and across many hundred Bacteria and Archaea. Associated with each FIGfam is a two-tiered, rapid, accurate decision procedure to determine family membership for new proteins. FIGfams are freely available under an open source license. These can be downloaded at ftp://ftp.theseed.org/FIGfams/. The web site for FIGfams is http://www.theseed.org/wiki/FIGfams/
Collapse
Affiliation(s)
- Folker Meyer
- Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, USA.
| | | | | |
Collapse
|
71
|
Chi PH, Pang B, Korkin D, Shyu CR. Efficient SCOP-fold classification and retrieval using index-based protein substructure alignments. ACTA ACUST UNITED AC 2009; 25:2559-65. [PMID: 19667079 DOI: 10.1093/bioinformatics/btp474] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION To investigate structure-function relationships, life sciences researchers usually retrieve and classify proteins with similar substructures into the same fold. A manually constructed database, SCOP, is believed to be highly accurate; however, it is labor intensive. Another known method, DALI, is also precise but computationally expensive. We have developed an efficient algorithm, namely, index-based protein substructure alignment (IPSA), for protein-fold classification. IPSA constructs a two-layer indexing tree to quickly retrieve similar substructures in proteins and suggests possible folds by aligning these substructures. RESULTS Compared with known algorithms, such as DALI, CE, MultiProt and MAMMOTH, on a sample dataset of non-redundant proteins from SCOP v1.73, IPSA exhibits an efficiency improvement of 53.10, 16.87, 3.60 and 1.64 times speedup, respectively. Evaluated on three different datasets of non-redundant proteins from SCOP, average accuracy of IPSA is approximately equal to DALI and better than CE, MAMMOTH, MultiProt and SSM. With reliable accuracy and efficiency, this work will benefit the study of high-throughput protein structure-function relationships. AVAILABILITY IPSA is publicly accessible at http://ProteinDBS.rnet.missouri.edu/IPSA.php
Collapse
Affiliation(s)
- Pin-Hao Chi
- Medical and Biological Digital Library Research Lab, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | | | | | | |
Collapse
|
72
|
Marsella L, Sirocco F, Trovato A, Seno F, Tosatto SCE. REPETITA: detection and discrimination of the periodicity of protein solenoid repeats by discrete Fourier transform. Bioinformatics 2009; 25:i289-95. [PMID: 19478001 PMCID: PMC2687986 DOI: 10.1093/bioinformatics/btp232] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Motivation: Proteins with solenoid repeats evolve more quickly than non-repetitive ones and their periodicity may be rapidly hidden at sequence level, while still evident in structure. In order to identify these repeats, we propose here a novel method based on a metric characterizing amino-acid properties (polarity, secondary structure, molecular volume, codon diversity, electric charge) using five previously derived numerical functions. Results: The five spectra of the candidate sequences coding for structural repeats, obtained by Discrete Fourier Transform (DFT), show common features allowing determination of repeat periodicity with excellent results. Moreover it is possible to introduce a phase space parameterized by two quantities related to the Fourier spectra which allow for a clear distinction between a non-homologous set of globular proteins and proteins with solenoid repeats. The DFT method is shown to be competitive with other state of the art methods in the detection of solenoid structures, while improving its performance especially in the identification of periodicities, since it is able to recognize the actual repeat length in most cases. Moreover it highlights the relevance of local structural propensities in determining solenoid repeats. Availability: A web tool implementing the algorithm presented in the article (REPETITA) is available with additional details on the data sets at the URL: http://protein.bio.unipd.it/repetita/. Contact:silvio.tosatto@unipd.it
Collapse
|
73
|
Da Silva M, Upton C. Vaccinia virus G8R protein: a structural ortholog of proliferating cell nuclear antigen (PCNA). PLoS One 2009; 4:e5479. [PMID: 19421403 PMCID: PMC2674943 DOI: 10.1371/journal.pone.0005479] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2009] [Accepted: 04/15/2009] [Indexed: 11/30/2022] Open
Abstract
Background Eukaryotic DNA replication involves the synthesis of both a DNA leading and lagging strand, the latter requiring several additional proteins including flap endonuclease (FEN-1) and proliferating cell nuclear antigen (PCNA) in order to remove RNA primers used in the synthesis of Okazaki fragments. Poxviruses are complex viruses (dsDNA genomes) that infect eukaryotes, but surprisingly little is known about the process of DNA replication. Given our previous results that the vaccinia virus (VACV) G5R protein may be structurally similar to a FEN-1-like protein and a recent finding that poxviruses encode a primase function, we undertook a series of in silico analyses to identify whether VACV also encodes a PCNA-like protein. Results An InterProScan of all VACV proteins using the JIPS software package was used to identify any PCNA-like proteins. The VACV G8R protein was identified as the only vaccinia protein that contained a PCNA-like sliding clamp motif. The VACV G8R protein plays a role in poxvirus late transcription and is known to interact with several other poxvirus proteins including itself. The secondary and tertiary structure of the VACV G8R protein was predicted and compared to the secondary and tertiary structure of both human and yeast PCNA proteins, and a high degree of similarity between all three proteins was noted. Conclusions The structure of the VACV G8R protein is predicted to closely resemble the eukaryotic PCNA protein; it possesses several other features including a conserved ubiquitylation and SUMOylation site that suggest that, like its counterpart in T4 bacteriophage (gp45), it may function as a sliding clamp ushering transcription factors to RNA polymerase during late transcription.
Collapse
Affiliation(s)
- Melissa Da Silva
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
| | - Chris Upton
- Department of Biochemistry and Microbiology, University of Victoria, Victoria, British Columbia, Canada
- * E-mail:
| |
Collapse
|
74
|
Shi S, Chitturi B, Grishin NV. ProSMoS server: a pattern-based search using interaction matrix representation of protein structures. Nucleic Acids Res 2009; 37:W526-31. [PMID: 19420061 PMCID: PMC2703969 DOI: 10.1093/nar/gkp316] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Assessing structural similarity and defining common regions through comparison of protein spatial structures is an important task in functional and evolutionary studies of proteins. There are many servers that compare structures and define sub-structures in common between proteins through superposition and closeness of either coordinates or contacts. However, a natural way to analyze a structure for experts working on structure classification is to look for specific three-dimensional (3D) motifs and patterns instead of finding common features in two proteins. Such motifs can be described by the architecture and topology of major secondary structural elements (SSEs) without consideration of subtle differences in 3D coordinates. Despite the importance of motif-based structure searches, currently there is a shortage of servers to perform this task. Widely known TOPS does not fully address this problem, as it finds only topological match but does not take into account other important spatial properties, such as interactions and chirality. Here, we implemented our approach to protein structure pattern search (ProSMoS) as a web-server. ProSMoS converts 3D structure into an interaction matrix representation including the SSE types, handednesses of connections between SSEs, coordinates of SSE starts and ends, types of interactions between SSEs and beta-sheet definitions. For a user-defined structure pattern, ProSMoS lists all structures from a database that contain this pattern. ProSMoS server will be of interest to structural biologists who would like to analyze very general and distant structural similarities. The ProSMoS web server is available at: http://prodata.swmed.edu/ProSMoS/.
Collapse
Affiliation(s)
- Shuoyong Shi
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, 5323 Harry Hines Blvd, Dallas, TX 75390-9050, USA
| | | | | |
Collapse
|
75
|
Haegeman A, Vanholme B, Gheysen G. Characterization of a putative endoxylanase in the migratory plant-parasitic nematode Radopholus similis. MOLECULAR PLANT PATHOLOGY 2009; 10:389-401. [PMID: 19400841 PMCID: PMC6640231 DOI: 10.1111/j.1364-3703.2009.00539.x] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/03/2023]
Abstract
Plant-parasitic nematodes have developed an arsenal of enzymes to degrade the rigid plant cell wall. In this article, we report the presence of a putative endoxylanase in the migratory endoparasitic nematode Radopholus similis. This enzyme is thought to facilitate the migration of the nematode, as it breaks down xylan, the major component of hemicellulose. The corresponding gene (Rs-xyl1) was cloned and the sequence revealed three small introns. Interestingly, the position of all three introns was conserved in a putative endoxylanase from Meloidogyne hapla, and the position of one intron was conserved in two endoxylanases from Meloidogyne incognita, which suggests a common ancestral gene. The spatial and temporal expression of the Rs-xyl1 gene was examined by in situ hybridization and semi-quantitative reverse transcriptase-polymerase chain reaction. The putative protein consists of a signal peptide, a catalytic domain and a carbohydrate-binding module (CBM). The catalytic domain showed similarity to both glycosyl hydrolase family 5 (GHF5) and GHF30 enzymes. Using Hidden Markov Model profiles and phylogenetic analysis, we were able to show that Rs-XYL1 and its closest homologues are not members of GHF5, as suggested previously, but rather form a subclass within GHF30. Silencing the putative endoxylanase by double-stranded RNA targeting of the CBM region resulted in an average decrease in infection of 60%, indicating that the gene is important for the nematode to complete its life cycle.
Collapse
Affiliation(s)
- Annelies Haegeman
- Faculty of Bioscience Engineering (FBE), Department of Molecular Biotechnology, Ghent University, Ghent, Belgium
| | | | | |
Collapse
|
76
|
Gao J, Li Z. Comparing four different approaches for the determination of inter-residue interactions provides insight for the structure prediction of helical membrane proteins. Biopolymers 2009; 91:547-56. [PMID: 19241463 DOI: 10.1002/bip.21175] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Studying inter-residue interactions provides insight into the folding and stability of both soluble and membrane proteins and is essential for developing computational tools for protein structure prediction. As the first step, various approaches for elucidating such interactions within protein structures have been proposed and proven useful. Since different approaches may grasp different aspects of protein structural folds, it is of interest to systematically compare them. In this work, we applied four approaches for determining inter-residue interactions to the analysis of three distinct structure datasets of helical membrane proteins and compared their correlation to the three individual quality measures of structures in these datasets. These datasets included one of 35 structures of rhodopsin receptors and bacterial rhodopsins determined at various resolutions, one derived from the HOMEP benchmark dataset previously reported, and one comprising of 139 homology models. It was found that the correlation between the average number of inter-residue interactions obtained by applying the four approaches and the available structure quality measures varied quite significantly among them. The best correlation was achieved by the approach focusing exclusively on favorable inter-residue interactions. These results provide interesting insight for the development of objective quality measure for the structure prediction of helical membrane proteins.
Collapse
Affiliation(s)
- Jun Gao
- Department of Bioinformatics, University of the Sciences in Philadelphia, Philadelphia, PA 19104, USA
| | | |
Collapse
|
77
|
Aceti DJ, Bitto E, Yakunin AF, Proudfoot M, Bingman CA, Frederick RO, Sreenath HK, Vojtik FC, Wrobel RL, Fox BG, Markley JL, Phillips GN. Structural and functional characterization of a novel phosphatase from the Arabidopsis thaliana gene locus At1g05000. Proteins 2009; 73:241-53. [PMID: 18433060 DOI: 10.1002/prot.22041] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
The crystal structure of the protein product of the gene locus At1g05000, a hypothetical protein from A. thaliana, was determined by the multiple-wavelength anomalous diffraction method and was refined to an R factor of 20.4% (R(free) = 24.9%) at 3.3 A. The protein adopts the alpha/beta fold found in cysteine phosphatases, a superfamily of phosphatases that possess a catalytic cysteine and form a covalent thiol-phosphate intermediate during the catalytic cycle. In At1g05000, the analogous cysteine (Cys(150)) is located at the bottom of a positively-charged pocket formed by residues that include the conserved arginine (Arg(156)) of the signature active site motif, HCxxGxxRT. Of 74 model phosphatase substrates tested, purified recombinant At1g05000 showed highest activity toward polyphosphate (poly-P(12-13)) and deoxyribo- and ribonucleoside triphosphates, and less activity toward phosphoenolpyruvate, phosphotyrosine, phosphotyrosine-containing peptides, and phosphatidyl inositols. Divalent metal cations were not required for activity and had little effect on the reaction.
Collapse
Affiliation(s)
- David J Aceti
- Department of Biochemistry, The Center for Eukaryotic Structural Genomics, University of Wisconsin at Madison, Madison, Wisconsin 53706, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
78
|
da Silveira CH, Pires DEV, Minardi RC, Ribeiro C, Veloso CJM, Lopes JCD, Meira W, Neshich G, Ramos CHI, Habesch R, Santoro MM. Protein cutoff scanning: A comparative analysis of cutoff dependent and cutoff free methods for prospecting contacts in proteins. Proteins 2009; 74:727-43. [PMID: 18704933 DOI: 10.1002/prot.22187] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Affiliation(s)
- Carlos H da Silveira
- Department of Biochemistry and Immunology, Institute of Biological Sciences, Federal University of Minas Gerais, UFMG, Brazil.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
79
|
Batyanovskii AV, Vlasov PK. Short protein segments with prevalent conformation. Biophysics (Nagoya-shi) 2009. [DOI: 10.1134/s0006350908040040] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
80
|
Han GW, Rife C, Sawaya MR. Applications of bioinformatics to protein structures: how protein structure and bioinformatics overlap. Methods Mol Biol 2009; 569:157-172. [PMID: 19623490 DOI: 10.1007/978-1-59745-524-4_8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/28/2023]
Abstract
In this chapter, we will focus on the role of bioinformatics to analyze a protein after its protein structure has been determined. First, we present how to validate protein structures for quality assurance. Then, we discuss how to analyze protein-protein interfaces and how to predict the biomolecule which is the biological oligomeric state of the protein. Finally, we discuss how to search for homologs based on the 3-D structure which is an essential step for understanding protein function.
Collapse
Affiliation(s)
- Gye Won Han
- Burnham Institute for Medical Research, La Jolla, CA, USA
| | | | | |
Collapse
|
81
|
Hrmova M, Fincher GB. Functional genomics and structural biology in the definition of gene function. Methods Mol Biol 2009; 513:199-227. [PMID: 19347658 DOI: 10.1007/978-1-59745-427-8_11] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
By mid-2007, the three-dimensional (3D) structures of some 45,000 proteins have been solved, over a period where the linear structures of millions of genes have been defined. Technical challenges associated with X-ray crystallography are being overcome and high-throughput methods both for crystallization of proteins and for solving their 3D structures are under development. The question arises as to how structural biology can be integrated with and adds value to functional genomics programs. Structural biology will assist in the definition of gene function through the identification of the likely function of the protein products of genes. The 3D information allows protein sequences predicted from DNA sequences to be classified into broad groups, according to the overall 'fold', or 3D shape, of the protein. Structural information can be used to predict the preferred substrate of a protein, and thereby greatly enhance the accurate annotation of the corresponding gene. Furthermore, it will enable the effects of amino acid substitutions in enzymes to be better understood with respect to enzyme function and could thereby provide insights into natural variation in genes. If the molecular basis of transcription factor-DNA interactions were defined through precise 3D knowledge of the protein-DNA binding site, it would be possible to predict the effects of base substitutions within the motif on the specificity and/or kinetics of binding. In this chapter, we present specific examples of how structural biology can provide valuable information for functional genomics programs.
Collapse
Affiliation(s)
- Maria Hrmova
- Australian Centre for Plant Functional Genomics, School of Agriculture, Food and Wine, University of Adelaide, Waite Campus, Glen Osmond, SA 5064, Australia
| | | |
Collapse
|
82
|
Bitto E, Bingman CA, Bittova L, Houston NL, Boston RS, Fox BG, Phillips GN. X-ray structure of ILL2, an auxin-conjugate amidohydrolase from Arabidopsis thaliana. Proteins 2009; 74:61-71. [PMID: 18543330 PMCID: PMC2605170 DOI: 10.1002/prot.22124] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The plant hormone indole-3-acetic acid (IAA) is the most abundant natural auxin involved in many aspects of plant development and growth. The IAA levels in plants are modulated by a specific group of amidohydrolases from the peptidase M20D family that release the active hormone from its conjugated storage forms. Here, we describe the X-ray crystal structure of IAA-amino acid hydrolase IAA-leucine resistantlike gene 2 (ILL2) from Arabidopsis thaliana at 2.0 A resolution. ILL2 preferentially hydrolyses the auxin-amino acid conjugate N-(indol-3-acetyl)-alanine. The overall structure of ILL2 is reminiscent of dinuclear metallopeptidases from the M20 peptidase family. The structure consists of two domains, a larger catalytic domain with three-layer alpha beta alpha sandwich architecture and aminopeptidase topology and a smaller satellite domain with two-layer alphabeta-sandwich architecture and alpha-beta-plaits topology. The metal-coordinating residues in the active site of ILL2 include a conserved cysteine that clearly distinguishes this protein from previously structurally characterized members of the M20 peptidase family. Modeling of N-(indol-3-acetyl)-alanine into the active site of ILL2 suggests that Leu175 serves as a key determinant for the amino acid side-chain specificity of this enzyme. Furthermore, a hydrophobic pocket nearby the catalytic dimetal center likely recognizes the indolyl moiety of the substrate. Finally, the active site of ILL2 harbors an absolutely conserved glutamate (Glu172), which is well positioned to act as a general acid-base residue. Overall, the structure of ILL2 suggests that this enzyme likely uses a catalytic mechanism that follows the paradigm established for the other enzymes of the M20 peptidase family.
Collapse
Affiliation(s)
- Eduard Bitto
- Center for Eukaryotic Structural Genomics, University of Wisconsin-Madison, Madison, WI 53706-1544
| | - Craig A. Bingman
- Center for Eukaryotic Structural Genomics, University of Wisconsin-Madison, Madison, WI 53706-1544
| | | | - Norma L. Houston
- Department of Plant Biology, North Carolina State University, Raleigh, NC 27695
| | - Rebecca S. Boston
- Department of Plant Biology, North Carolina State University, Raleigh, NC 27695
| | - Brian G. Fox
- Center for Eukaryotic Structural Genomics, University of Wisconsin-Madison, Madison, WI 53706-1544
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706-1544
| | - George N. Phillips
- Center for Eukaryotic Structural Genomics, University of Wisconsin-Madison, Madison, WI 53706-1544
- Department of Biochemistry, University of Wisconsin-Madison, Madison, WI 53706-1544
| |
Collapse
|
83
|
Quantitative criteria for native energetic heterogeneity influences in the prediction of protein folding kinetics. Proc Natl Acad Sci U S A 2008; 106:434-9. [PMID: 19075236 DOI: 10.1073/pnas.0810218105] [Citation(s) in RCA: 63] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Energy landscape theory requires that the protein-folding mechanism is generally globally directed or funneled toward the native state. The collective nature of transition state ensembles further suggests that sufficient averaging of the native interactions can occur so that the knowledge of the native topology may suffice for predicting the mechanism. Nevertheless, while simple homogeneously weighted native topology-based models predict the folding mechanisms for many proteins, for other proteins knowledge of the native topology, by itself, seems not to suffice in determining the folding mechanism. Simulations of proteins with differing topologies reveal that the failure of homogeneously weighted topology-based models can, however, be completely understood within the framework of a funneled energy landscape and can be quantified by comparing the fluctuation of entropy cost for forming contacts to the expected fluctuations in contact energy. To be precise, we find the transition state ensembles of proteins with all-alpha topologies, which are more uniform in the specific entropy cost of contact formation, have transition state ensembles that are more readily perturbed by differences in energetic weights than are the transition state ensembles of proteins with significant amounts of beta-structure, where the specific entropy costs of contact formation are more widely distributed. This behavior is consistent with a random-field Ising model analogy that follows from the free energy functional approach to folding.
Collapse
|
84
|
Bray T, Doig AJ, Warwicker J. Sequence and structural features of enzymes and their active sites by EC class. J Mol Biol 2008; 386:1423-36. [PMID: 19100748 DOI: 10.1016/j.jmb.2008.11.057] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2008] [Revised: 11/25/2008] [Accepted: 11/27/2008] [Indexed: 10/21/2022]
Abstract
We have analysed a non-redundant set of 294 enzymes for differences in sequence and structural features between the six main Enzyme Commission (EC) classification groups. This systematic study of enzymes, and their active sites in particular, aims to increase understanding of how the structure of an enzyme relates to its functional role. Many features showed significant differences between the EC classes, including active-site polarity, enzyme size and active-site amino acid propensities. Many attributes correlate with each other to form clusters of related features from which we chose representative features for further analysis. Oxidoreductases have more non-polar active sites, which can be attributed to cofactor binding and a preference for Glu over Asp in active sites in comparison to the other classes. Lyases form a significantly higher proportion of oligomers than any other class, whilst the hydrolases form the largest proportion of monomers. These features were then used in a prediction model that classified each enzyme into its top EC class with an accuracy of 33.1%, which is an increase of 16.4% over random classification. Understanding the link between structure and function is critical to improving enzyme design and the prediction of protein function from structure without transfer of annotation from alignments.
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK
| | | | | |
Collapse
|
85
|
Pabuwal V, Li Z. Comparative analysis of the packing topology of structurally important residues in helical membrane and soluble proteins. Protein Eng Des Sel 2008; 22:67-73. [DOI: 10.1093/protein/gzn074] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
86
|
Bitto E, Bingman CA, Bittova L, Kondrashov DA, Bannen RM, Fox BG, Markley JL, Phillips GN. Structure of human J-type co-chaperone HscB reveals a tetracysteine metal-binding domain. J Biol Chem 2008; 283:30184-92. [PMID: 18713742 PMCID: PMC2573069 DOI: 10.1074/jbc.m804746200] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2008] [Revised: 08/14/2008] [Indexed: 11/06/2022] Open
Abstract
Iron-sulfur proteins play indispensable roles in a broad range of biochemical processes. The biogenesis of iron-sulfur proteins is a complex process that has become a subject of extensive research. The final step of iron-sulfur protein assembly involves transfer of an iron-sulfur cluster from a cluster-donor to a cluster-acceptor protein. This process is facilitated by a specialized chaperone system, which consists of a molecular chaperone from the Hsc70 family and a co-chaperone of the J-domain family. The 3.0 A crystal structure of a human mitochondrial J-type co-chaperone HscB revealed an L-shaped protein that resembles Escherichia coli HscB. The important difference between the two homologs is the presence of an auxiliary metal-binding domain at the N terminus of human HscB that coordinates a metal via the tetracysteine consensus motif CWXCX(9-13)FCXXCXXXQ. The domain is found in HscB homologs from animals and plants as well as in magnetotactic bacteria. The metal-binding site of the domain is structurally similar to that of rubredoxin and several zinc finger proteins containing rubredoxin-like knuckles. The normal mode analysis of HscB revealed that this L-shaped protein preferentially undergoes a scissors-like motion that correlates well with the conformational changes of human HscB observed in the crystals.
Collapse
Affiliation(s)
- Eduard Bitto
- Center for Eukaryotic Structural Genomics, University of Wisconsin-Madison, Madison, Wisconsin 53706-1544, USA
| | | | | | | | | | | | | | | |
Collapse
|
87
|
Sirocco F, Tosatto SCE. TESE: generating specific protein structure test set ensembles. Bioinformatics 2008; 24:2632-3. [PMID: 18796478 DOI: 10.1093/bioinformatics/btn488] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
UNLABELLED TESE is a web server for the generation of test sets of protein sequences and structures fulfilling a number of different criteria. At least three different use cases can be envisaged: (i) benchmarking of novel methods; (ii) test sets tailored for special needs and (iii) extending available datasets. The CATH structure classification is used to control structural/sequence redundancy and a variety of structural quality parameters can be used to interactively select protein subsets with specific characteristics, e.g. all X-ray structures of alpha-helical repeat proteins with more than 120 residues and resolution <2.0 A. The output includes FASTA-formatted sequences, PDB files and a clickable HTML index file containing images of the selected proteins. Multiple subsets for cross-validation are also supported. AVAILABILITY The TESE server is available for non-commercial use at URL: http://protein.bio.unipd.it/tese/.
Collapse
Affiliation(s)
- Francesco Sirocco
- Department of Biology, University of Padova, Viale G. Colombo 3, 35131 Padova, Italy
| | | |
Collapse
|
88
|
Gao J, Li Z. Inter-residue interactions in protein structures exhibit power-law behavior. Biopolymers 2008; 89:1174-8. [PMID: 18712852 DOI: 10.1002/bip.21072] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Inter-residue interactions play an essential role in driving protein folding, and analysis of these interactions increases our understanding of protein folding and stability and facilitates the development of tools for protein structure and function prediction. In this work, we systematically characterized the change of inter-residue interactions at various sequence separation cutoffs using two protein datasets. The first set included 100 diverse, nonredundant and high-resolution soluble protein structures, covering all four major structural classes, all-alpha, alpha/beta, alpha+beta, and all-beta; and the second set included 20 diverse, nonredundant and high-resolution membrane protein structures, representing 19 unique superfamilies. It was shown that the average number of inter-residue interactions in structures of both datasets displays the power-law behavior. Fitting parameters of the power-law function are directly related to the structural classes analyzed. These findings provided further insight into the distribution of short-, medium-, and long-range inter-residue interactions in both soluble and membrane proteins and could be used for protein structure prediction.
Collapse
Affiliation(s)
- Jun Gao
- Department of Bioinformatics and Computer Science, University of the Sciences in Philadelphia, Philadelphia, PA 19104, USA
| | | |
Collapse
|
89
|
Nuutinen T, Tossavainen H, Fredriksson K, Pirilä P, Permi P, Pospiech H, Syvaoja JE. The solution structure of the amino-terminal domain of human DNA polymerase epsilon subunit B is homologous to C-domains of AAA+ proteins. Nucleic Acids Res 2008; 36:5102-10. [PMID: 18676977 PMCID: PMC2528186 DOI: 10.1093/nar/gkn497] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
DNA polymerases α, δ and ε are large multisubunit complexes that replicate the bulk of the DNA in the eukaryotic cell. In addition to the homologous catalytic subunits, these enzymes possess structurally related B subunits, characterized by a carboxyterminal calcineurin-like and an aminoproximal oligonucleotide/oligosaccharide binding-fold domain. The B subunits also share homology with the exonuclease subunit of archaeal DNA polymerases D. Here, we describe a novel domain specific to the N-terminus of the B subunit of eukaryotic DNA polymerases ε. The N-terminal domain of human DNA polymerases ε (Dpoe2NT) expressed in Escherichia coli was characterized. Circular dichroism studies demonstrated that Dpoe2NT forms a stable, predominantly α-helical structure. The solution structure of Dpoe2NT revealed a domain that consists of a left-handed superhelical bundle. Four helices are arranged in two hairpins and the connecting loops contain short β-strand segments that form a short parallel sheet. DALI searches demonstrated a striking structural similarity of the Dpoe2NT with the α-helical subdomains of ATPase associated with various cellular activity (AAA+) proteins (the C-domain). Like C-domains, Dpoe2NT is rich in charged amino acids. The biased distribution of the charged residues is reflected by a polarization and a considerable dipole moment across the Dpoe2NT. Dpoe2NT represents the first C-domain fold not associated with an AAA+ protein.
Collapse
|
90
|
Gao J, Li Z. Conserved network properties of helical membrane protein structures and its implication for improving membrane protein homology modeling at the twilight zone. J Comput Aided Mol Des 2008; 23:755-63. [DOI: 10.1007/s10822-008-9220-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2007] [Accepted: 05/13/2008] [Indexed: 01/21/2023]
|
91
|
Sequence similarity network reveals common ancestry of multidomain proteins. PLoS Comput Biol 2008; 4:e1000063. [PMID: 18475320 PMCID: PMC2377100 DOI: 10.1371/journal.pcbi.1000063] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2007] [Accepted: 03/18/2008] [Indexed: 11/25/2022] Open
Abstract
We address the problem of homology identification in complex multidomain families with varied domain architectures. The challenge is to distinguish sequence pairs that share common ancestry from pairs that share an inserted domain but are otherwise unrelated. This distinction is essential for accuracy in gene annotation, function prediction, and comparative genomics. There are two major obstacles to multidomain homology identification: lack of a formal definition and lack of curated benchmarks for evaluating the performance of new methods. We offer preliminary solutions to both problems: 1) an extension of the traditional model of homology to include domain insertions; and 2) a manually curated benchmark of well-studied families in mouse and human. We further present Neighborhood Correlation, a novel method that exploits the local structure of the sequence similarity network to identify homologs with great accuracy based on the observation that gene duplication and domain shuffling leave distinct patterns in the sequence similarity network. In a rigorous, empirical comparison using our curated data, Neighborhood Correlation outperforms sequence similarity, alignment length, and domain architecture comparison. Neighborhood Correlation is well suited for automated, genome-scale analyses. It is easy to compute, does not require explicit knowledge of domain architecture, and classifies both single and multidomain homologs with high accuracy. Homolog predictions obtained with our method, as well as our manually curated benchmark and a web-based visualization tool for exploratory analysis of the network neighborhood structure, are available at http://www.neighborhoodcorrelation.org. Our work represents a departure from the prevailing view that the concept of homology cannot be applied to genes that have undergone domain shuffling. In contrast to current approaches that either focus on the homology of individual domains or consider only families with identical domain architectures, we show that homology can be rationally defined for multidomain families with diverse architectures by considering the genomic context of the genes that encode them. Our study demonstrates the utility of mining network structure for evolutionary information, suggesting this is a fertile approach for investigating evolutionary processes in the post-genomic era. New genes evolve through the duplication and modification of existing genes. As a result, genes that share common ancestry tend to have similar structure and function. Computational methods that use common ancestry have been extraordinarily successful in inferring function. The practice of discerning evolutionary relationships is stymied, however, by modular sequences made up of two or more domains. When two genes share some domains but not others, it is difficult to distinguish a case of common ancestry from insertion of the same domain into both genes. We present a formal framework to define how multidomain genes are related, and propose a novel method for rapid, robust characterization of evolutionary relationships. In an empirical comparison with the current state of the art, we demonstrate superior performance of our method using a large hand-curated set of sequences known to share common ancestry. The success of our method derives from its unique ability to infer evolutionary history from local topology in the sequence similarity network. This represents a departure from the view that protein family classification must be restricted to families with conserved architecture. By exploiting the structure of the sequence similarity network, our approach surmounts this limitation and opens the door to studies of the role of modularity in protein evolution.
Collapse
|
92
|
Ward RM, Erdin S, Tran TA, Kristensen DM, Lisewski AM, Lichtarge O. De-orphaning the structural proteome through reciprocal comparison of evolutionarily important structural features. PLoS One 2008; 3:e2136. [PMID: 18461181 PMCID: PMC2362850 DOI: 10.1371/journal.pone.0002136] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2008] [Accepted: 03/25/2008] [Indexed: 12/01/2022] Open
Abstract
Function prediction frequently relies on comparing genes or gene products to search for relevant similarities. Because the number of protein structures with unknown function is mushrooming, however, we asked here whether such comparisons could be improved by focusing narrowly on the key functional features of protein structures, as defined by the Evolutionary Trace (ET). Therefore a series of algorithms was built to (a) extract local motifs (3D templates) from protein structures based on ET ranking of residue importance; (b) to assess their geometric and evolutionary similarity to other structures; and (c) to transfer enzyme annotation whenever a plurality was reached across matches. Whereas a prototype had only been 80% accurate and was not scalable, here a speedy new matching algorithm enabled large-scale searches for reciprocal matches and thus raised annotation specificity to 100% in both positive and negative controls of 49 enzymes and 50 non-enzymes, respectively—in one case even identifying an annotation error—while maintaining sensitivity (∼60%). Critically, this Evolutionary Trace Annotation (ETA) pipeline requires no prior knowledge of functional mechanisms. It could thus be applied in a large-scale retrospective study of 1218 structural genomics enzymes and reached 92% accuracy. Likewise, it was applied to all 2935 unannotated structural genomics proteins and predicted enzymatic functions in 320 cases: 258 on first pass and 62 more on second pass. Controls and initial analyses suggest that these predictions are reliable. Thus the large-scale evolutionary integration of sequence-structure-function data, here through reciprocal identification of local, functionally important structural features, may contribute significantly to de-orphaning the structural proteome.
Collapse
Affiliation(s)
- R. Matthew Ward
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Serkan Erdin
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Tuan A. Tran
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - David M. Kristensen
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Andreas Martin Lisewski
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, Texas, United States of America
- Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, United States of America
- * E-mail:
| |
Collapse
|
93
|
Schrag JD, Jiralerspong S, Banville M, Jaramillo ML, O'Connor-McCourt MD. The crystal structure and dimerization interface of GADD45gamma. Proc Natl Acad Sci U S A 2008; 105:6566-71. [PMID: 18445651 PMCID: PMC2373355 DOI: 10.1073/pnas.0800086105] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2008] [Indexed: 01/27/2023] Open
Abstract
Gadd45 proteins are recognized as tumor and autoimmune suppressors whose expression can be induced by genotoxic stresses. These proteins are involved in cell cycle control, growth arrest, and apoptosis through interactions with a wide variety of binding partners. We report here the crystal structure of Gadd45gamma, which reveals a fold comprising an alphabetaalpha sandwich with a central five-stranded mixed beta-sheet with alpha-helices packed on either side. Based on crystallographic symmetry we identified the dimer interface of Gadd45gamma dimers by generating point mutants that compromised dimerization while leaving the tertiary structure of the monomer intact. The dimer interface comprises a four-helix bundle involving residues that are the most highly conserved among Gadd45 isoforms. Cell-based assays using these point mutants demonstrate that dimerization is essential for growth inhibition. This structural information provides a new context for evaluation of the plethora of protein-protein interactions that govern the many functions of the Gadd45 family of proteins.
Collapse
Affiliation(s)
- Joseph D Schrag
- Biotechnology Research Institute, National Research Council Canada, 6100 Royalmount Avenue, Montreal, QC, Canada.
| | | | | | | | | |
Collapse
|
94
|
Manikandan K, Pal D, Ramakumar S, Brener NE, Iyengar SS, Seetharaman G. Functionally important segments in proteins dissected using Gene Ontology and geometric clustering of peptide fragments. Genome Biol 2008; 9:R52. [PMID: 18331637 PMCID: PMC2397504 DOI: 10.1186/gb-2008-9-3-r52] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2007] [Revised: 02/24/2008] [Accepted: 03/10/2008] [Indexed: 11/25/2022] Open
Abstract
A geometric clustering algorithm has been developed to dissect protein fragments based on their relevance to function. We have developed a geometric clustering algorithm using backbone φ,ψ angles to group conformationally similar peptide fragments of any length. By labeling each fragment in the cluster with the level-specific Gene Ontology 'molecular function' term of its protein, we are able to compute statistics for molecular function-propensity and p-value of individual fragments in the cluster. Clustering-cum-statistical analysis for peptide fragments 8 residues in length and with only trans peptide bonds shows that molecular function propensities ≥20 and p-values ≤0.05 can dissect fragments within a protein linked to the molecular function.
Collapse
|
95
|
Sulakhe D, Rodriguez A, Wilde M, Foster I, Maltsev N. Interoperability of GADU in Using Heterogeneous Grid Resources for Bioinformatics Applications. ACTA ACUST UNITED AC 2008; 12:241-6. [DOI: 10.1109/titb.2007.897783] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
96
|
Structure of glycerol-3-phosphate dehydrogenase, an essential monotopic membrane enzyme involved in respiration and metabolism. Proc Natl Acad Sci U S A 2008; 105:3280-5. [PMID: 18296637 DOI: 10.1073/pnas.0712331105] [Citation(s) in RCA: 124] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Sn-glycerol-3-phosphate dehydrogenase (GlpD) is an essential membrane enzyme, functioning at the central junction of respiration, glycolysis, and phospholipid biosynthesis. Its critical role is indicated by the multitiered regulatory mechanisms that stringently controls its expression and function. Once expressed, GlpD activity is regulated through lipid-enzyme interactions in Escherichia coli. Here, we report seven previously undescribed structures of the fully active E. coli GlpD, up to 1.75 A resolution. In addition to elucidating the structure of the native enzyme, we have determined the structures of GlpD complexed with substrate analogues phosphoenolpyruvate, glyceric acid 2-phosphate, glyceraldehyde-3-phosphate, and product, dihydroxyacetone phosphate. These structural results reveal conformational states of the enzyme, delineating the residues involved in substrate binding and catalysis at the glycerol-3-phosphate site. Two probable mechanisms for catalyzing the dehydrogenation of glycerol-3-phosphate are envisioned, based on the conformational states of the complexes. To further correlate catalytic dehydrogenation to respiration, we have additionally determined the structures of GlpD bound with ubiquinone analogues menadione and 2-n-heptyl-4-hydroxyquinoline N-oxide, identifying a hydrophobic plateau that is likely the ubiquinone-binding site. These structures illuminate probable mechanisms of catalysis and suggest how GlpD shuttles electrons into the respiratory pathway. Glycerol metabolism has been implicated in insulin signaling and perturbations in glycerol uptake and catabolism are linked to obesity in humans. Homologs of GlpD are found in practically all organisms, from prokaryotes to humans, with >45% consensus protein sequences, signifying that these structural results on the prokaryotic enzyme may be readily applied to the eukaryotic GlpD enzymes.
Collapse
|
97
|
Piedra D, Lois S, de la Cruz X. Preservation of protein clefts in comparative models. BMC STRUCTURAL BIOLOGY 2008; 8:2. [PMID: 18199319 PMCID: PMC2249585 DOI: 10.1186/1472-6807-8-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/29/2007] [Accepted: 01/16/2008] [Indexed: 11/29/2022]
Abstract
BACKGROUND Comparative, or homology, modelling of protein structures is the most widely used prediction method when the target protein has homologues of known structure. Given that the quality of a model may vary greatly, several studies have been devoted to identifying the factors that influence modelling results. These studies usually consider the protein as a whole, and only a few provide a separate discussion of the behaviour of biologically relevant features of the protein. Given the value of the latter for many applications, here we extended previous work by analysing the preservation of native protein clefts in homology models. We chose to examine clefts because of their role in protein function/structure, as they are usually the locus of protein-protein interactions, host the enzymes' active site, or, in the case of protein domains, can also be the locus of domain-domain interactions that lead to the structure of the whole protein. RESULTS We studied how the largest cleft of a protein varies in comparative models. To this end, we analysed a set of 53507 homology models that cover the whole sequence identity range, with a special emphasis on medium and low similarities. More precisely we examined how cleft quality - measured using six complementary parameters related to both global shape and local atomic environment, depends on the sequence identity between target and template proteins. In addition to this general analysis, we also explored the impact of a number of factors on cleft quality, and found that the relationship between quality and sequence identity varies depending on cleft rank amongst the set of protein clefts (when ordered according to size), and number of aligned residues. CONCLUSION We have examined cleft quality in homology models at a range of seq.id. levels. Our results provide a detailed view of how quality is affected by distinct parameters and thus may help the user of comparative modelling to determine the final quality and applicability of his/her cleft models. In addition, the large variability in model quality that we observed within each sequence bin, with good models present even at low sequence identities (between 20% and 30%), indicates that properly developed identification methods could be used to recover good cleft models in this sequence range.
Collapse
Affiliation(s)
- David Piedra
- Institut de Recerca Biomèdica, C/Josep Samitier, 1-5, 08028 Barcelona, Spain
| | - Sergi Lois
- Institut de Recerca Biomèdica, C/Josep Samitier, 1-5, 08028 Barcelona, Spain
- Instituto de Biología Molecular de Barcelona, CID, Consejo Superior de Investigaciones Científicas (CSIC), Barcelona, Spain
| | - Xavier de la Cruz
- Institut de Recerca Biomèdica, C/Josep Samitier, 1-5, 08028 Barcelona, Spain
- Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain
| |
Collapse
|
98
|
Pabuwal V, Li Z. Network pattern of residue packing in helical membrane proteins and its application in membrane protein structure prediction. Protein Eng Des Sel 2008; 21:55-64. [DOI: 10.1093/protein/gzm059] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
99
|
Costa S, Cesareni G. Domains mediate protein-protein interactions and nucleate protein assemblies. Handb Exp Pharmacol 2008:383-405. [PMID: 18491061 DOI: 10.1007/978-3-540-72843-6_16] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/03/2023]
Abstract
Cell physiology is governed by an intricate mesh of physical and functional links among proteins, nucleic acids and other metabolites. The recent information flood coming from large-scale genomic and proteomic approaches allows us to foresee the possibility of compiling an exhaustive list of the molecules present within a cell, enriched with quantitative information on concentration and cellular localization. Moreover, several high-throughput experimental and computational techniques have been devised to map all the protein interactions occurring in a living cell. So far, such maps have been drawn as graphs where nodes represent proteins and edges represent interactions. However, this representation does not take into account the intrinsically modular nature of proteins and thus fails in providing an effective description of the determinants of binding. Since proteins are composed of domains that often confer on proteins their binding capabilities, a more informative description of the interaction network would detail, for each pair of interacting proteins in the network, which domains mediate the binding. Understanding how protein domains combine to mediate protein interactions would allow one to add important features to the protein interaction network, making it possible to discriminate between simultaneously occurring and mutually exclusive interactions. This objective can be achieved by experimentally characterizing domain recognition specificity or by analyzing the frequency of co-occurring domains in proteins that do interact. Such approaches allow gaining insights on the topology of complexes with unknown three-dimensional structure, thus opening the prospect of adopting a more rational strategy in developing drugs designed to selectively target specific protein interactions.
Collapse
Affiliation(s)
- S Costa
- University of Rome Tor Vergata, Via della Ricerca Scientifica, Rome, Italy
| | | |
Collapse
|
100
|
|