1
|
Heinzinger M, Littmann M, Sillitoe I, Bordin N, Orengo C, Rost B. Contrastive learning on protein embeddings enlightens midnight zone. NAR Genom Bioinform 2022; 4:lqac043. [PMID: 35702380 PMCID: PMC9188115 DOI: 10.1093/nargab/lqac043] [Citation(s) in RCA: 25] [Impact Index Per Article: 12.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2021] [Revised: 03/25/2022] [Accepted: 05/17/2022] [Indexed: 12/23/2022] Open
Abstract
Experimental structures are leveraged through multiple sequence alignments, or more generally through homology-based inference (HBI), facilitating the transfer of information from a protein with known annotation to a query without any annotation. A recent alternative expands the concept of HBI from sequence-distance lookup to embedding-based annotation transfer (EAT). These embeddings are derived from protein Language Models (pLMs). Here, we introduce using single protein representations from pLMs for contrastive learning. This learning procedure creates a new set of embeddings that optimizes constraints captured by hierarchical classifications of protein 3D structures defined by the CATH resource. The approach, dubbed ProtTucker, has an improved ability to recognize distant homologous relationships than more traditional techniques such as threading or fold recognition. Thus, these embeddings have allowed sequence comparison to step into the 'midnight zone' of protein similarity, i.e. the region in which distantly related sequences have a seemingly random pairwise sequence similarity. The novelty of this work is in the particular combination of tools and sampling techniques that ascertained good performance comparable or better to existing state-of-the-art sequence comparison methods. Additionally, since this method does not need to generate alignments it is also orders of magnitudes faster. The code is available at https://github.com/Rostlab/EAT.
Collapse
Affiliation(s)
- Michael Heinzinger
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Maria Littmann
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| | - Ian Sillitoe
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Nicola Bordin
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Christine Orengo
- Institute of Structural and Molecular Biology, University College London, London WC1E 6BT, UK
| | - Burkhard Rost
- TUM (Technical University of Munich) Dept Informatics, Bioinformatics & Computational Biology - i12, Boltzmannstr. 3, 85748 Garching/Munich, Germany
| |
Collapse
|
2
|
Rajapaksa S, Sumanaweera D, Lesk AM, Allison L, Stuckey PJ, Garcia de la Banda M, Abramson D, Konagurthu AS. OUP accepted manuscript. Bioinformatics 2022; 38:i255-i263. [PMID: 35758808 PMCID: PMC9235515 DOI: 10.1093/bioinformatics/btac247] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 04/09/2022] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Alignments are correspondences between sequences. How reliable are alignments of amino acid sequences of proteins, and what inferences about protein relationships can be drawn? Using techniques not previously applied to these questions, by weighting every possible sequence alignment by its posterior probability we derive a formal mathematical expectation, and develop an efficient algorithm for computation of the distance between alternative alignments allowing quantitative comparisons of sequence-based alignments with corresponding reference structure alignments. RESULTS By analyzing the sequences and structures of 1 million protein domain pairs, we report the variation of the expected distance between sequence-based and structure-based alignments, as a function of (Markov time of) sequence divergence. Our results clearly demarcate the 'daylight', 'twilight' and 'midnight' zones for interpreting residue-residue correspondences from sequence information alone. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Sandun Rajapaksa
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | | | - Arthur M Lesk
- Department of Biochemistry and Molecular Biology and Center for Computational Biology and Bioinformatics, The Pennsylvania State University, University Park, PA 16802, USA
| | - Lloyd Allison
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Peter J Stuckey
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - Maria Garcia de la Banda
- Department of Data Science and Artificial Intelligence, Faculty of Information Technology, Monash University, Clayton, VIC 3800, Australia
| | - David Abramson
- Research Computing Center, University of Queensland, St Lucia, QLD 4067, Australia
| | | |
Collapse
|
3
|
Abstract
For two decades, Rosetta has consistently been at the forefront of protein structure
prediction. While it has become a very large package comprising programs, scripts, and tools, for
different types of macromolecular modelling such as ligand docking, protein-protein docking,
protein design, and loop modelling, it started as the implementation of an algorithm for ab initio
protein structure prediction. The term ’Rosetta’ appeared for the first time twenty years ago in the
literature to describe that algorithm and its contribution to the third edition of the community wide
Critical Assessment of techniques for protein Structure Prediction (CASP3). Similar to the Rosetta
stone that allowed deciphering the ancient Egyptian civilisation, David Baker and his co-workers
have been contributing to deciphering ’the second half of the genetic code’. Although the focus of
Baker’s team has expended to de novo protein design in the past few years, Rosetta’s ‘fame’ is
associated with its fragment-assembly protein structure prediction approach. Following a
presentation of the main concepts underpinning its foundation, especially sequence-structure
correlation and usage of fragments, we review the main stages of its developments and highlight
the milestones it has achieved in terms of protein structure prediction, particularly in CASP.
Collapse
Affiliation(s)
- Jad Abbass
- Department of Computer Science, Lebanese International University, Bekaa, Lebanon
| | - Jean-Christophe Nebel
- Faculty of Science, Engineering and Computing, Kingston University, London, KT1 2EE, United Kingdom
| |
Collapse
|
4
|
Conway JM, Crosby JR, Hren AP, Southerland RT, Lee LL, Lunin VV, Alahuhta P, Himmel ME, Bomble YJ, Adams MWW, Kelly RM. Novel multidomain, multifunctional glycoside hydrolases from highly lignocellulolytic
Caldicellulosiruptor
species. AIChE J 2018. [DOI: 10.1002/aic.16354] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Affiliation(s)
- Jonathan M. Conway
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - James R. Crosby
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - Andrew P. Hren
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - Robert T. Southerland
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | - Laura L. Lee
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| | | | - Petri Alahuhta
- Biosciences CenterNational Renewable Energy LaboratoryGoldenCO80401
| | | | | | - Michael W. W. Adams
- Dept. of Biochemistry and Molecular BiologyUniversity of GeorgiaAthensGA30602
| | - Robert M. Kelly
- Dept. of Chemical and Biomolecular EngineeringNorth Carolina State UniversityRaleighNC27695
| |
Collapse
|
5
|
Abstract
CLC proteins are a ubiquitously expressed family of chloride-selective ion channels and transporters. A dearth of pharmacological tools for modulating CLC gating and ion conduction limits investigations aimed at understanding CLC structure/function and physiology. Herein, we describe the design, synthesis, and evaluation of a collection of N-arylated benzimidazole derivatives (BIMs), one of which (BIM1) shows unparalleled (>20-fold) selectivity for CLC-Ka over CLC-Kb, the two most closely related human CLC homologs. Computational docking to a CLC-Ka homology model has identified a BIM1 binding site on the extracellular face of the protein near the chloride permeation pathway in a region previously identified as a binding site for other less selective inhibitors. Results from site-directed mutagenesis experiments are consistent with predictions of this docking model. The residue at position 68 is 1 of only ∼20 extracellular residues that differ between CLC-Ka and CLC-Kb. Mutation of this residue in CLC-Ka and CLC-Kb (N68D and D68N, respectively) reverses the preference of BIM1 for CLC-Ka over CLC-Kb, thus showing the critical role of residue 68 in establishing BIM1 selectivity. Molecular docking studies together with results from structure-activity relationship studies with 19 BIM derivatives give insight into the increased selectivity of BIM1 compared with other inhibitors and identify strategies for further developing this class of compounds.
Collapse
|
6
|
Structural characterization of ANGPTL8 (betatrophin) with its interacting partner lipoprotein lipase. Comput Biol Chem 2016; 61:210-20. [PMID: 26908254 DOI: 10.1016/j.compbiolchem.2016.01.009] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2015] [Revised: 01/07/2016] [Accepted: 01/21/2016] [Indexed: 12/20/2022]
Abstract
Angiopoietin-like protein 8 (ANGPTL8) (also known as betatrophin) is a newly identified secretory protein with a potential role in autophagy, lipid metabolism and pancreatic beta-cell proliferation. Its structural characterization is required to enhance our current understanding of its mechanism of action which could help in identifying its receptor and/or other binding partners. Based on the physiological significance and necessity of exploring structural features of ANGPTL8, the present study is conducted with a specific aim to model the structure of ANGPTL8 and study its possible interactions with Lipoprotein Lipase (LPL). To the best of our knowledge, this is the first attempt to predict 3-dimensional (3D) structure of ANGPTL8. Three different approaches were used for modeling of ANGPTL8 including homology modeling, de-novo structure prediction and their amalgam which is then proceeded by structure verification using ERRATT, PROSA, Qmean and Ramachandran plot scores. The selected models of ANGPTL8 were further evaluated for protein-protein interaction (PPI) analysis with LPL using CPORT and HADDOCK server. Our results have shown that the crystal structure of iSH2 domain of Phosphatidylinositol 3-kinase (PI3K) p85β subunit (PDB entry: 3mtt) is a good candidate for homology modeling of ANGPTL8. Analysis of inter-molecular interactions between the structure of ANGPTL8 and LPL revealed existence of several non-covalent interactions. The residues of LPL involved in these interactions belong from its lid region, thrombospondin (TSP) region and heparin binding site which is suggestive of a possible role of ANGPTL8 in regulating the proteolysis, motility and localization of LPL. Besides, the conserved residues of SE1 region of ANGPTL8 formed interactions with the residues around the hinge region of LPL. Overall, our results support a model of inhibition of LPL by ANGPTL8 through the steric block of its catalytic site which will be further explored using wet lab studies in future.
Collapse
|
7
|
Shabelnikov S, Kiselev A. Cysteine-Rich Atrial Secretory Protein from the Snail Achatina achatina: Purification and Structural Characterization. PLoS One 2015; 10:e0138787. [PMID: 26444993 PMCID: PMC4596865 DOI: 10.1371/journal.pone.0138787] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2015] [Accepted: 09/03/2015] [Indexed: 11/28/2022] Open
Abstract
Despite extensive studies of cardiac bioactive peptides and their functions in molluscs, soluble proteins expressed in the heart and secreted into the circulation have not yet been reported. In this study, we describe an 18.1-kDa, cysteine-rich atrial secretory protein (CRASP) isolated from the terrestrial snail Achatina achatina that has no detectable sequence similarity to any known protein or nucleotide sequence. CRASP is an acidic, 158-residue, N-glycosylated protein composed of eight alpha-helical segments stabilized with five disulphide bonds. A combination of fold recognition algorithms and ab initio folding predicted that CRASP adopts an all-alpha, right-handed superhelical fold. CRASP is most strongly expressed in the atrium in secretory atrial granular cells, and substantial amounts of CRASP are released from the heart upon nerve stimulation. CRASP is detected in the haemolymph of intact animals at nanomolar concentrations. CRASP is the first secretory protein expressed in molluscan atrium to be reported. We propose that CRASP is an example of a taxonomically restricted gene that might be responsible for adaptations specific for terrestrial pulmonates.
Collapse
Affiliation(s)
- Sergey Shabelnikov
- Department of Cytology and Histology, Saint-Petersburg State University, St. Petersburg, Russia
- Laboratory of Cell Morphology, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia
| | - Artem Kiselev
- Laboratory of Cell Morphology, Institute of Cytology, Russian Academy of Sciences, St. Petersburg, Russia
- Institute of Molecular Biology and Genetics, Almazov Federal Medical Research Centre, St. Petersburg, Russia
| |
Collapse
|
8
|
PvdP is a tyrosinase that drives maturation of the pyoverdine chromophore in Pseudomonas aeruginosa. J Bacteriol 2014; 196:2681-90. [PMID: 24816606 DOI: 10.1128/jb.01376-13] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
The iron binding siderophore pyoverdine constitutes a major adaptive factor contributing to both virulence and survival in fluorescent pseudomonads. For decades, pyoverdine production has allowed the identification and classification of fluorescent and nonfluorescent pseudomonads. Here, we demonstrate that PvdP, a periplasmic enzyme of previously unknown function, is a tyrosinase required for the maturation of the pyoverdine chromophore in Pseudomonas aeruginosa. PvdP converts the nonfluorescent ferribactin, containing two iron binding groups, into a fluorescent pyoverdine, forming a strong hexadentate complex with ferrous iron, by three consecutive oxidation steps. PvdP represents the first characterized member of a small family of tyrosinases present in fluorescent pseudomonads that are required for siderophore maturation and are capable of acting on large peptidic substrates.
Collapse
|
9
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
10
|
Xu D, Jaroszewski L, Li Z, Godzik A. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. ACTA ACUST UNITED AC 2013; 30:660-7. [PMID: 24130308 DOI: 10.1093/bioinformatics/btt578] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Homology detection enables grouping proteins into families and prediction of their structure and function. The range of application of homology-based predictions can be significantly extended by using sequence profiles and incorporation of local structural features. However, incorporation of the latter terms varies a lot between existing methods, and together with many examples of distant relations not recognized even by the best methods, suggests that further improvements are still possible. RESULTS Here we describe recent improvements to the fold and function assignment system (FFAS) method, including adding optimized structural features (experimental or predicted), 'symmetrical' Z-score calculation and re-ranking the templates with a neural network. The alignment accuracy in the new FFAS-3D is now 11% higher than the original and comparable with the most accurate template-based structure prediction algorithms. At the same time, FFAS-3D has high success rate at the Structural Classification of Proteins (SCOP) family, superfamily and fold levels. Importantly, FFAS-3D results are not highly correlated with other programs suggesting that it may significantly improve meta-predictions. FFAS-3D does not require 3D structures of the templates, as using predicted features instead of structure-derived does not lead to the decrease of accuracy. Because of that, FFAS-3D can be used for databases other than Protein Data Bank (PDB) such as Protein families database or Clusters of orthologous groups thus extending its applications to functional annotations of genomes and protein families. AVAILABILITY AND IMPLEMENTATION FFAS-3D is available at http://ffas.godziklab.org.
Collapse
Affiliation(s)
- Dong Xu
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093-0446, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Fahad Medical Research Center, King Abdulaziz University, P.O. Box 80216, Jeddah 21589, Kingdom of Saudi Arabia
| | | | | | | |
Collapse
|
11
|
Mishra S, Saxena A, Sangwan RS. Fundamentals of Homology Modeling Steps and Comparison among Important Bioinformatics Tools: An Overview. ACTA ACUST UNITED AC 2013. [DOI: 10.17311/sciintl.2013.237.252] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
12
|
A novel predicted calcium-regulated kinase family implicated in neurological disorders. PLoS One 2013; 8:e66427. [PMID: 23840464 PMCID: PMC3696010 DOI: 10.1371/journal.pone.0066427] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2012] [Accepted: 05/08/2013] [Indexed: 12/03/2022] Open
Abstract
The catalogues of protein kinases, the essential effectors of cellular signaling, have been charted in Metazoan genomes for a decade now. Yet, surprisingly, using bioinformatics tools, we predicted protein kinase structure for proteins coded by five related human genes and their Metazoan homologues, the FAM69 family. Analysis of three-dimensional structure models and conservation of the classic catalytic motifs of protein kinases present in four out of five human FAM69 proteins suggests they might have retained catalytic phosphotransferase activity. An EF-hand Ca2+-binding domain in FAM69A and FAM69B proteins, inserted within the structure of the kinase domain, suggests they may function as Ca2+-dependent kinases. The FAM69 genes, FAM69A, FAM69B, FAM69C, C3ORF58 (DIA1) and CXORF36 (DIA1R), are by large uncharacterised molecularly, yet linked to several neurological disorders in genetics studies. The C3ORF58 gene is found deleted in autism, and resides in the Golgi. Unusually high cysteine content and presence of signal peptides in some of the family members suggest that FAM69 proteins may be involved in phosphorylation of proteins in the secretory pathway and/or of extracellular proteins.
Collapse
|
13
|
Dhingra P, Jayaram B. A homology/ab initio hybrid algorithm for sampling near-native protein conformations. J Comput Chem 2013; 34:1925-36. [PMID: 23728619 DOI: 10.1002/jcc.23339] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Revised: 03/09/2013] [Accepted: 04/21/2013] [Indexed: 12/19/2022]
Abstract
One of the major challenges for protein tertiary structure prediction strategies is the quality of conformational sampling algorithms, which can effectively and readily search the protein fold space to generate near-native conformations. In an effort to advance the field by making the best use of available homology as well as fold recognition approaches along with ab initio folding methods, we have developed Bhageerath-H Strgen, a homology/ab initio hybrid algorithm for protein conformational sampling. The methodology is tested on the benchmark CASP9 dataset of 116 targets. In 93% of the cases, a structure with TM-score ≥ 0.5 is generated in the pool of decoys. Further, the performance of Bhageerath-H Strgen was seen to be efficient in comparison with different decoy generation methods. The algorithm is web enabled as Bhageerath-H Strgen web tool which is made freely accessible for protein decoy generation (http://www.scfbio-iitd.res.in/software/Bhageerath-HStrgen1.jsp).
Collapse
Affiliation(s)
- Priyanka Dhingra
- Department of Chemistry, Indian Institute of Technology, Hauz Khas, New Delhi, 110016, India
| | | |
Collapse
|
14
|
Vishnepolsky B, Managadze G, Grigolava M, Pirtskhalava M. Evaluation performance of substitution matrices, based on contacts between residue terminal groups. J Biomol Struct Dyn 2012; 30:180-90. [PMID: 22702729 DOI: 10.1080/07391102.2012.677769] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Sequence alignment is a standard method for the estimation of the evolutionary, structural, and functional relationships among amino acid sequences. The quality of alignments depends on the used similarity matrix. Statistical contact potentials (CPs) contain information on contact propensities among residues in native protein structures. Substitution matrices (SMs) based on CPs are applicable for the comparison of distantly related sequences. Here, contact between amino acids was estimated on the basis of the evaluation of the distances between side-chain terminal groups (SCTGs), which are defined as the group of the side-chain heavy atoms with fixed distances between them. In this paper, two new types of CPs and similarity matrices have been constructed: one based on fixed cutoff distance obtained from geometric characteristics of the SCTGs (TGC1), while the other is distance-dependent potential (TGC2). These matrices are compared with other popular SMs. The performance of the matrices was evaluated by comparing sequence with structural alignments. The obtained results show that TGC2 has the best performance among contact-based matrices, but on the whole, contact-based matrices have slightly lower performance than other SMs except fold-level similarity.
Collapse
Affiliation(s)
- Boris Vishnepolsky
- Life Science Research Centre, Laboratory of Bioinformatics, 14 Gotua St, Tbilisi, 0160, Georgia.
| | | | | | | |
Collapse
|
15
|
Zhou H, Skolnick J. FINDSITE(X): a structure-based, small molecule virtual screening approach with application to all identified human GPCRs. Mol Pharm 2012; 9:1775-84. [PMID: 22574683 DOI: 10.1021/mp3000716] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
We have developed FINDSITE(X), an extension of FINDSITE, a protein threading based algorithm for the inference of protein binding sites, biochemical function and virtual ligand screening, that removes the limitation that holo protein structures (those containing bound ligands) of a sufficiently large set of distant evolutionarily related proteins to the target be solved; rather, predicted protein structures and experimental ligand binding information are employed. To provide the predicted protein structures, a fast and accurate version of our recently developed TASSER(VMT), TASSER(VMT)-lite, for template-based protein structural modeling applicable up to 1000 residues is developed and tested, with comparable performance to the top CASP9 servers. Then, a hybrid approach that combines structure alignments with an evolutionary similarity score for identifying functional relationships between target and proteins with binding data has been developed. By way of illustration, FINDSITE(X) is applied to 998 identified human G-protein coupled receptors (GPCRs). First, TASSER(VMT)-lite provides updates of all human GPCR structures previously modeled in our lab. We then use these structures and the new function similarity detection algorithm to screen all human GPCRs against the ZINC8 nonredundant (TC < 0.7) ligand set combined with ligands from the GLIDA database (a total of 88,949 compounds). Testing (excluding GPCRs whose sequence identity > 30% to the target from the binding data library) on a 168 human GPCR set with known binding data, the average enrichment factor in the top 1% of the compound library (EF(0.01)) is 22.7, whereas EF(0.01) by FINDSITE is 7.1. For virtual screening when just the target and its native ligands are excluded, the average EF(0.01) reaches 41.4. We also analyze off-target interactions for the 168 protein test set. All predicted structures, virtual screening data and off-target interactions for the 998 human GPCRs are available at http://cssb.biology.gatech.edu/skolnick/webservice/gpcr/index.html .
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street, N.W., Atlanta, Georgia 30318, United States
| | | |
Collapse
|
16
|
Pentony MM, Winters P, Penfold-Brown D, Drew K, Narechania A, DeSalle R, Bonneau R, Purugganan MD. The plant proteome folding project: structure and positive selection in plant protein families. Genome Biol Evol 2012; 4:360-71. [PMID: 22345424 PMCID: PMC3318447 DOI: 10.1093/gbe/evs015] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open
Abstract
Despite its importance, relatively little is known about the relationship between the structure, function, and evolution of proteins, particularly in land plant species. We have developed a database with predicted protein domains for five plant proteomes (http://pfp.bio.nyu.edu) and used both protein structural fold recognition and de novo Rosetta-based protein structure prediction to predict protein structure for Arabidopsis and rice proteins. Based on sequence similarity, we have identified ∼15,000 orthologous/paralogous protein family clusters among these species and used codon-based models to predict positive selection in protein evolution within 175 of these sequence clusters. Our results show that codons that display positive selection appear to be less frequent in helical and strand regions and are overrepresented in amino acid residues that are associated with a change in protein secondary structure. Like in other organisms, disordered protein regions also appear to have more selected sites. Structural information provides new functional insights into specific plant proteins and allows us to map positively selected amino acid sites onto protein structures and view these sites in a structural and functional context.
Collapse
Affiliation(s)
- M M Pentony
- Center for Genomics and Systems Biology, Department of Biology, New York University, NY, USA
| | | | | | | | | | | | | | | |
Collapse
|
17
|
Dudkiewicz M, Szczepińska T, Grynberg M, Pawłowski K. A novel protein kinase-like domain in a selenoprotein, widespread in the tree of life. PLoS One 2012; 7:e32138. [PMID: 22359664 PMCID: PMC3281104 DOI: 10.1371/journal.pone.0032138] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2011] [Accepted: 01/24/2012] [Indexed: 12/21/2022] Open
Abstract
Selenoproteins serve important functions in many organisms, usually providing essential oxidoreductase enzymatic activity, often for defense against toxic xenobiotic substances. Most eukaryotic genomes possess a small number of these proteins, usually not more than 20. Selenoproteins belong to various structural classes, often related to oxidoreductase function, yet a few of them are completely uncharacterised. Here, the structural and functional prediction for the uncharacterised selenoprotein O (SELO) is presented. Using bioinformatics tools, we predict that SELO protein adopts a three-dimensional fold similar to protein kinases. Furthermore, we argue that despite the lack of conservation of the “classic” catalytic aspartate residue of the archetypical His-Arg-Asp motif, SELO kinases might have retained catalytic phosphotransferase activity, albeit with an atypical active site. Lastly, the role of the selenocysteine residue is considered and the possibility of an oxidoreductase-regulated kinase function for SELO is discussed. The novel kinase prediction is discussed in the context of functional data on SELO orthologues in model organisms, FMP40 a.k.a.YPL222W (yeast), and ydiU (bacteria). Expression data from bacteria and yeast suggest a role in oxidative stress response. Analysis of genomic neighbourhoods of SELO homologues in the three domains of life points toward a role in regulation of ABC transport, in oxidative stress response, or in basic metabolism regulation. Among bacteria possessing SELO homologues, there is a significant over-representation of aquatic organisms, also of aerobic ones. The selenocysteine residue in SELO proteins occurs only in few members of this protein family, including proteins from Metazoa, and few small eukaryotes (Ostreococcus, stramenopiles). It is also demonstrated that enterobacterial mchC proteins involved in maturation of bactericidal antibiotics, microcins, form a distant subfamily of the SELO proteins. The new protein structural domain, with a putative kinase function assigned, expands the known kinome and deserves experimental determination of its biological role within the cell-signaling network.
Collapse
Affiliation(s)
| | - Teresa Szczepińska
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
| | - Marcin Grynberg
- Institute of Biochemistry and Biophysics, Polish Academy of Sciences, Warsaw, Poland
| | - Krzysztof Pawłowski
- Nencki Institute of Experimental Biology, Polish Academy of Sciences, Warsaw, Poland
- Warsaw University of Life Sciences, Warsaw, Poland
- * E-mail:
| |
Collapse
|
18
|
Structural correlates of selectivity and inactivation in potassium channels. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2011; 1818:272-85. [PMID: 21958666 DOI: 10.1016/j.bbamem.2011.09.007] [Citation(s) in RCA: 62] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/17/2011] [Revised: 09/07/2011] [Accepted: 09/09/2011] [Indexed: 12/23/2022]
Abstract
Potassium channels are involved in a tremendously diverse range of physiological applications requiring distinctly different functional properties. Not surprisingly, the amino acid sequences for these proteins are diverse as well, except for the region that has been ordained the "selectivity filter". The goal of this review is to examine our current understanding of the role of the selectivity filter and regions adjacent to it in specifying selectivity as well as its role in gating/inactivation and possible mechanisms by which these processes are coupled. Our working hypothesis is that an amino acid network behind the filter modulates selectivity in channels with the same signature sequence while at the same time affecting channel inactivation properties. This article is part of a Special Issue entitled: Membrane protein structure and function.
Collapse
|
19
|
Cai XH, Jaroszewski L, Wooley J, Godzik A. Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering. Proteins 2011; 79:2389-402. [PMID: 21671455 DOI: 10.1002/prot.23049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Revised: 02/12/2011] [Accepted: 03/13/2011] [Indexed: 12/14/2022]
Abstract
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.
Collapse
Affiliation(s)
- Xiao-Hui Cai
- Joint Center for Structural Genomics, Center for Research in Biological Systems, University of California, San Diego, California 92093-0446, USA
| | | | | | | |
Collapse
|
20
|
Han GW, Elsliger MA, Yeates TO, Xu Q, Murzin AG, Krishna SS, Jaroszewski L, Abdubek P, Astakhova T, Axelrod HL, Carlton D, Chen C, Chiu HJ, Clayton T, Das D, Deller MC, Duan L, Ernst D, Feuerhelm J, Grant JC, Grzechnik A, Jin KK, Johnson HA, Klock HE, Knuth MW, Kozbial P, Kumar A, Lam WW, Marciano D, McMullan D, Miller MD, Morse AT, Nigoghossian E, Okach L, Reyes R, Rife CL, Sefcovic N, Tien HJ, Trame CB, van den Bedem H, Weekes D, Hodgson KO, Wooley J, Deacon AM, Godzik A, Lesley SA, Wilson IA. Structure of a putative NTP pyrophosphohydrolase: YP_001813558.1 from Exiguobacterium sibiricum 255-15. Acta Crystallogr Sect F Struct Biol Cryst Commun 2010; 66:1237-44. [PMID: 20944217 PMCID: PMC2954211 DOI: 10.1107/s1744309110025534] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2010] [Accepted: 06/29/2010] [Indexed: 11/24/2022]
Abstract
The crystal structure of a putative NTPase, YP_001813558.1 from Exiguobacterium sibiricum 255-15 (PF09934, DUF2166) was determined to 1.78 Å resolution. YP_001813558.1 and its homologs (dimeric dUTPases, MazG proteins and HisE-encoded phosphoribosyl ATP pyrophosphohydrolases) form a superfamily of all-α-helical NTP pyrophosphatases. In dimeric dUTPase-like proteins, a central four-helix bundle forms the active site. However, in YP_001813558.1, an unexpected intertwined swapping of two of the helices that compose the conserved helix bundle results in a `linked dimer' that has not previously been observed for this family. Interestingly, despite this novel mode of dimerization, the metal-binding site for divalent cations, such as magnesium, that are essential for NTPase activity is still conserved. Furthermore, the active-site residues that are involved in sugar binding of the NTPs are also conserved when compared with other α-helical NTPases, but those that recognize the nucleotide bases are not conserved, suggesting a different substrate specificity.
Collapse
Affiliation(s)
- Gye Won Han
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Marc-André Elsliger
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Todd O. Yeates
- Department of Chemistry and Biochemistry, University of California Los Angeles, Los Angeles, CA, USA
| | - Qingping Xu
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Alexey G. Murzin
- MRC Laboratory of Molecular Biology, Hills Road, Cambridge, England
| | - S. Sri Krishna
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, CA, USA
| | - Lukasz Jaroszewski
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, CA, USA
| | - Polat Abdubek
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Tamara Astakhova
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Herbert L. Axelrod
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Dennis Carlton
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Connie Chen
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Hsiu-Ju Chiu
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Thomas Clayton
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Debanu Das
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Marc C. Deller
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Lian Duan
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Dustin Ernst
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Julie Feuerhelm
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Joanna C. Grant
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Anna Grzechnik
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Kevin K. Jin
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Hope A. Johnson
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Heath E. Klock
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Mark W. Knuth
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Piotr Kozbial
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, CA, USA
| | - Abhinav Kumar
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Winnie W. Lam
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - David Marciano
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Daniel McMullan
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Mitchell D. Miller
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Andrew T. Morse
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Edward Nigoghossian
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Linda Okach
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Ron Reyes
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Christopher L. Rife
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Natasha Sefcovic
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, CA, USA
| | - Henry J. Tien
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| | - Christine B. Trame
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Henry van den Bedem
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Stanford Synchrotron Radiation Lightsource, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - Dana Weekes
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, CA, USA
| | - Keith O. Hodgson
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Photon Science, SLAC National Accelerator Laboratory, Menlo Park, CA, USA
| | - John Wooley
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Ashley M. Deacon
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
| | - Adam Godzik
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Center for Research in Biological Systems, University of California, San Diego, La Jolla, CA, USA
- Program on Bioinformatics and Systems Biology, Sanford–Burnham Medical Research Institute, La Jolla, CA, USA
| | - Scott A. Lesley
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
- Protein Sciences Department, Genomics Institute of the Novartis Research Foundation, San Diego, CA, USA
| | - Ian A. Wilson
- Joint Center for Structural Genomics, http://www.jcsg.org, USA
- Department of Molecular Biology, The Scripps Research Institute, La Jolla, CA, USA
| |
Collapse
|
21
|
Nielsen M, Lundegaard C, Lund O, Petersen TN. CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles. Nucleic Acids Res 2010; 38:W576-81. [PMID: 20542909 PMCID: PMC2896139 DOI: 10.1093/nar/gkq535] [Citation(s) in RCA: 235] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/04/2022] Open
Abstract
CPHmodels-3.0 is a web server predicting protein 3D structure by use of single template homology modeling. The server employs a hybrid of the scoring functions of CPHmodels-2.0 and a novel remote homology-modeling algorithm. A query sequence is first attempted modeled using the fast CPHmodels-2.0 profile–profile scoring function suitable for close homology modeling. The new computational costly remote homology-modeling algorithm is only engaged provided that no suitable PDB template is identified in the initial search. CPHmodels-3.0 was benchmarked in the CASP8 competition and produced models for 94% of the targets (117 out of 128), 74% were predicted as high reliability models (87 out of 117). These achieved an average RMSD of 4.6 Å when superimposed to the 3D structure. The remaining 26% low reliably models (30 out of 117) could superimpose to the true 3D structure with an average RMSD of 9.3 Å. These performance values place the CPHmodels-3.0 method in the group of high performing 3D prediction tools. Beside its accuracy, one of the important features of the method is its speed. For most queries, the response time of the server is <20 min. The web server is available at http://www.cbs.dtu.dk/services/CPHmodels/.
Collapse
Affiliation(s)
- Morten Nielsen
- Center for Biological Sequence Analysis, Department of systems Biology, The Technical University of Denmark, Denmark
| | | | | | | |
Collapse
|
22
|
Abstract
Functional characterization of a protein is often facilitated by its 3D structure. However, the fraction of experimentally known 3D models is currently less than 1% due to the inherently time-consuming and complicated nature of structure determination techniques. Computational approaches are employed to bridge the gap between the number of known sequences and that of 3D models. Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies for template-based structure modeling will be discussed with a focus on comparative modeling, by reviewing techniques available for all the major steps involved in the comparative modeling pipeline.
Collapse
Affiliation(s)
- Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
23
|
Abstract
The observation that similar protein sequences fold into similar three-dimensional structures provides a basis for the methods which predict structural features of a novel protein based on the similarity between its sequence and sequences of known protein structures. Similarity over entire sequence or large sequence fragment(s) enables prediction and modeling of entire structural domains while statistics derived from distributions of local features of known protein structures make it possible to predict such features in proteins with unknown structures. The accuracy of models of protein structures is sufficient for many practical purposes such as analysis of point mutation effects, enzymatic reactions, interaction interfaces of protein complexes, and active sites. Protein models are also used for phasing of crystallographic data and, in some cases, for drug design. By using models one can avoid the costly and time-consuming process of experimental structure determination. The purpose of this chapter is to give a practical review of the most popular protein structure prediction methods based on sequence similarity and to outline a practical approach to protein structure prediction. While the main focus of this chapter is on template-based protein structure prediction, it also provides references to other methods and programs which play an important role in protein structure prediction.
Collapse
|
24
|
Zhou H, Skolnick J. Protein structure prediction by pro-Sp3-TASSER. Biophys J 2009; 96:2119-27. [PMID: 19289038 DOI: 10.1016/j.bpj.2008.12.3898] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 11/12/2008] [Accepted: 12/03/2008] [Indexed: 12/29/2022] Open
Abstract
An automated protein structure prediction algorithm, pro-sp3-Threading/ASSEmbly/Refinement (TASSER), is described and benchmarked. Structural templates are identified using five different scoring functions derived from the previously developed threading methods PROSPECTOR_3 and SP(3). Top templates identified by each scoring function are combined to derive contact and distant restraints for subsequent model refinement by short TASSER simulations. For Medium/Hard targets (those with moderate to poor quality templates and/or alignments), alternative template alignments are also generated by parametric alignment and the top models selected by TASSER-QA are included in the contact and distance restraint derivation. Then, multiple short TASSER simulations are used to generate an ensemble of full-length models. Subsequently, the top models are selected from the ensemble by TASSER-QA and used to derive TASSER contacts and distant restraints for another round of full TASSER refinement. The final models are selected from both rounds of TASSER simulations by TASSER-QA. We compare pro-sp3-TASSER with our previously developed MetaTASSER method (enhanced with chunk-TASSER for Medium/Hard targets) on a representative test data set of 723 proteins <250 residues in length. For the 348 proteins classified as easy targets (those templates with good alignments and global structure similarity to the target), the cumulative TM-score of the best of top five models by pro-sp3-TASSER shows a 2.1% improvement over MetaTASSER. For the 155/220 medium/hard targets, the improvements in TM-score are 2.8% and 2.2%, respectively. All improvements are statistically significant. More importantly, the number of foldable targets (those having models whose TM-score to native >0.4 in the top five clusters) increases from 472 to 497 for all targets, and the relative increases for medium and hard targets are 10% and 15%, respectively. A server that implements the above algorithm is available at http://cssb.biology.gatech.edu/skolnick/webservice/pro-sp3-TASSER/. The source code is also available upon request.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia, USA
| | | |
Collapse
|
25
|
Zhu J, Fan H, Periole X, Honig B, Mark AE. Refining homology models by combining replica-exchange molecular dynamics and statistical potentials. Proteins 2009; 72:1171-88. [PMID: 18338384 DOI: 10.1002/prot.22005] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
A protocol is presented for the global refinement of homology models of proteins. It combines the advantages of temperature-based replica-exchange molecular dynamics (REMD) for conformational sampling and the use of statistical potentials for model selection. The protocol was tested using 21 models. Of these 14 were models of 10 small proteins for which high-resolution crystal structures were available, the remainder were targets of the recent CASPR exercise. It was found that REMD in combination with currently available force fields could sample near-native conformational states starting from high-quality homology models. Conformations in which the backbone RMSD of secondary structure elements (SSE-RMSD) was lower than the starting value by 0.5-1.0 A were found for 15 out of the 21 cases (average 0.82 A). Furthermore, when a simple scoring function consisting of two statistical potentials was used to rank the structures, one or more structures with SSE-RMSD of at least 0.2 A lower than the starting value was found among the five best ranked structures in 11 out of the 21 cases. The average improvement in SSE-RMSD for the best models was 0.42 A. However, none of the scoring functions tested identified the structures with the lowest SSE-RMSD as the best models although all identified the native conformation as the one with lowest energy. This suggests that while the proposed protocol proved effective for the refinement of high-quality models of small proteins scoring functions remain one of the major limiting factors in structure refinement. This and other aspects by which the methodology could be further improved are discussed.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute and Columbia University, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, USA
| | | | | | | | | |
Collapse
|
26
|
Pacheco B, Maccarana M, Goodlett DR, Malmström A, Malmström L. Identification of the active site of DS-epimerase 1 and requirement of N-glycosylation for enzyme function. J Biol Chem 2008; 284:1741-7. [PMID: 19004833 DOI: 10.1074/jbc.m805479200] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Dermatan sulfate is a highly sulfated polysaccharide and has a variety of biological functions in development and disease. Iduronic acid domains in dermatan sulfate, which are formed by the action of two DS-epimerases, have a key role in mediating these functions. We have identified the catalytic site and three putative catalytic residues in DS-epimerase 1, His-205, Tyr-261, and His-450, by tertiary structure modeling and amino acid conservation to heparinase II. These residues were systematically mutated to alanine or more conserved residues, which resulted in complete loss of epimerase activity. Based on these data and the close relationship between lyase and epimerase reactions, we propose a model where His-450 functions as a general base abstracting the C5 proton from glucuronic acid. Subsequent cleavage of the glycosidic linkage by Tyr-261 generates a 4,5-unsaturated hexuronic intermediate, which is protonated at the C5 carbon by His-205 from the side of the sugar plane opposite to the side of previous proton abstraction. Concomitant recreation of the glycosidic linkage ends the reaction, generating iduronic acid. In addition, we show that proper N-glycosylation of DS-epimerase 1 is required for enzyme activity. This study represents the first description of the structural basis for epimerization by a glycosaminoglycan epimerase.
Collapse
Affiliation(s)
- Benny Pacheco
- Department of Experimental Medical Science, Lund University, Biomedical Center D12, SE-221 84 Lund, Sweden.
| | | | | | | | | |
Collapse
|
27
|
Arnon TI, Kaiser JT, West AP, Olson R, Diskin R, Viertlboeck BC, Göbel TW, Bjorkman PJ. The crystal structure of CHIR-AB1: a primordial avian classical Fc receptor. J Mol Biol 2008; 381:1012-24. [PMID: 18625238 DOI: 10.1016/j.jmb.2008.06.082] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2008] [Revised: 06/25/2008] [Accepted: 06/26/2008] [Indexed: 01/22/2023]
Abstract
CHIR-AB1 is a newly identified avian immunoglobulin (Ig) receptor that includes both activating and inhibitory motifs and was therefore classified as a potentially bifunctional receptor. Recently, CHIR-AB1 was shown to bind the Fc region of chicken IgY and to induce calcium mobilization via association with the common gamma-chain, a subunit that transmits signals upon ligation of many different immunoreceptors. Here we describe the 1.8-A-resolution crystal structure of the CHIR-AB1 ectodomain. The receptor ectodomain consists of a single C2-type Ig domain resembling the Ig-like domains found in mammalian Fc receptors such as FcgammaRs and FcalphaRI. Unlike these receptors and other monomeric Ig superfamily members, CHIR-AB1 crystallized as a 2-fold symmetrical homodimer that bears no resemblance to variable or constant region dimers in an antibody. Analytical ultracentrifugation demonstrated that CHIR-AB1 exists as a mixture of monomers and dimers in solution, and equilibrium gel filtration revealed a 2:1 receptor/ligand binding stoichiometry. Measurement of the 1:1 CHIR-AB1/IgY interaction affinity indicates a relatively low affinity complex, but a 2:1 CHIR-AB1/IgY interaction allows an increase in apparent affinity due to avidity effects when the receptor is tethered to a surface. Taken together, these results add to the structural understanding of Fc receptors and their functional mechanisms.
Collapse
Affiliation(s)
- Tal I Arnon
- Division of Biology, 114-96 and Howard Hughes Medical Institute, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | | | | | | | |
Collapse
|
28
|
Zhou H, Pandit SB, Lee SY, Borreguero J, Chen H, Wroblewska L, Skolnick J. Analysis of TASSER-based CASP7 protein structure prediction results. Proteins 2008; 69 Suppl 8:90-7. [PMID: 17705276 DOI: 10.1002/prot.21649] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
An improved TASSER (Threading/ASSEmbly/Refinement) methodology is applied to predict the tertiary structure for all CASP7 targets. TASSER employs template identification by threading, followed by tertiary structure assembly by rearranging continuous template fragments, where conformational space is searched via Parallel Hyperbolic Monte Carlo sampling with an optimized force-field that includes knowledge-based statistical potentials and restraints derived from threading templates. The final models are selected by clustering structures from the low temperature replicas. Improvements in TASSER over CASP6 involve use of better templates from 3D-jury applied to three threading programs, PROSPECTOR_3, SP(3), and SPARKS, and a fragment comparison method for better model ranking. For targets with no reliable templates, a variant of TASSER (chunk-TASSER) is also applied with potentials and restraints extracted from ab initio folded supersecondary chunks of the target to build full-length models. For all 124 CASP targets/domains, the average root-mean-square-deviation (RMSD) from native and alignment coverage of the best initial threading models from 3D-jury are 6.2 A and 93%, respectively. Following TASSER reassembly, the average RMSD of the best model in the template aligned region decreases to 4.9 A and the average TM-score increases from 0.617 for the template to 0.678 for the best full-length model. Based on target difficulty, the average TM-scores of the final model to native are 0.904, 0.671, and 0.307 for high-accuracy template-based modeling, template-based modeling, and free modeling targets/domains, respectively. For the more difficult targets, TASSER with modest human intervention performed better in comparison to its server counterpart, MetaTASSER, which used a limited time simulation.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318, USA
| | | | | | | | | | | | | |
Collapse
|
29
|
Axelrod HL, McMullan D, Krishna SS, Miller MD, Elsliger MA, Abdubek P, Ambing E, Astakhova T, Carlton D, Chiu HJ, Clayton T, Duan L, Feuerhelm J, Grzechnik SK, Hale J, Han GW, Haugen J, Jaroszewski L, Jin KK, Klock HE, Knuth MW, Koesema E, Morse AT, Nigoghossian E, Okach L, Oommachen S, Paulsen J, Quijano K, Reyes R, Rife CL, van den Bedem H, Weekes D, White A, Wolf G, Xu Q, Hodgson KO, Wooley J, Deacon AM, Godzik A, Lesley SA, Wilson IA. Crystal structure of AICAR transformylase IMP cyclohydrolase (TM1249) fromThermotoga maritima at 1.88 Å resolution. Proteins 2008; 71:1042-9. [DOI: 10.1002/prot.21967] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
30
|
Sterner B, Singh R, Berger B. Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 2007; 14:1058-73. [PMID: 17887954 DOI: 10.1089/cmb.2007.0042] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.
Collapse
Affiliation(s)
- Beckett Sterner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | | | |
Collapse
|
31
|
Friedberg I, Godzik A. Connecting the protein structure universe by using sparse recurring fragments. Structure 2007; 13:1213-24. [PMID: 16084393 DOI: 10.1016/j.str.2005.05.009] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2005] [Revised: 04/22/2005] [Accepted: 05/11/2005] [Indexed: 10/25/2022]
Abstract
The quest to order and classify protein structures has lead to various classification schemes, focusing mostly on hierarchical relationships between structural domains. At the coarsest classification level, such schemes typically identify hundreds of types of fundamental units called folds. As a result, we picture protein structure space as a collection of isolated fold islands. It is obvious, however, that many protein folds share structural and functional commonalities. Locating those commonalities is important for our understanding of protein structure, function, and evolution. Here, we present an alternative view of the protein fold space, based on an interfold similarity measure that is related to the frequency of fragments shared between folds. In this view, protein structures form a complicated, crossconnected network with very interesting topology. We show that interfold similarity based on sequence/structure fragments correlates well with similarities of functions between protein populations in different folds.
Collapse
Affiliation(s)
- Iddo Friedberg
- Program in Bioinformatics and Systems Biology, The Burnham Institute, La Jolla, California 92037, USA.
| | | |
Collapse
|
32
|
Fernandez-Fuentes N, Rai BK, Madrid-Aliste CJ, Fajardo JE, Fiser A. Comparative protein structure modeling by combining multiple templates and optimizing sequence-to-structure alignments. Bioinformatics 2007; 23:2558-65. [PMID: 17823132 DOI: 10.1093/bioinformatics/btm377] [Citation(s) in RCA: 72] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Two major bottlenecks in advancing comparative protein structure modeling are the efficient combination of multiple template structures and the generation of a correct input target-template alignment. RESULTS A novel method, Multiple Mapping Method with Multiple Templates (M4T) is introduced that implements an algorithm to automatically select and combine Multiple Template structures (MT) and an alignment optimization protocol (Multiple Mapping Method, MMM). The MT module of M4T selects and combines multiple template structures through an iterative clustering approach that takes into account the 'unique' contribution of each template, their sequence similarity among themselves and to the target sequence, and their experimental resolution. MMM is a sequence-to-structure alignment method that optimally combines alternatively aligned regions according to their fit in the structural environment of the template structure. The resulting M4T alignment is used as input to a comparative modeling module. The performance of M4T has been benchmarked on CASP6 comparative modeling target sequences and on a larger independent test set, and showed favorable performance to current state of the art methods.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Department of Biochemistry and Seaver Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Avenue, Bronx, NY 10461, USA
| | | | | | | | | |
Collapse
|
33
|
Abstract
We have developed an ab initio protein structure prediction method called chunk-TASSER that uses ab initio folded supersecondary structure chunks of a given target as well as threading templates for obtaining contact potentials and distance restraints. The predicted chunks, selected on the basis of a new fragment comparison method, are folded by a fragment insertion method. Full-length models are built and refined by the TASSER methodology, which searches conformational space via parallel hyperbolic Monte Carlo. We employ an optimized reduced force field that includes knowledge-based statistical potentials and restraints derived from the chunks as well as threading templates. The method is tested on a dataset of 425 hard target proteins < or =250 amino acids in length. The average TM-scores of the best of top five models per target are 0.266, 0.336, and 0.362 by the threading algorithm SP(3), original TASSER and chunk-TASSER, respectively. For a subset of 80 proteins with predicted alpha-helix content > or =50%, these averages are 0.284, 0.356, and 0.403, respectively. The percentages of proteins with the best of top five models having TM-score > or =0.4 (a statistically significant threshold for structural similarity) are 3.76, 20.94, and 28.94% by SP(3), TASSER, and chunk-TASSER, respectively, overall, while for the subset of 80 predominantly helical proteins, these percentages are 2.50, 23.75, and 41.25%. Thus, chunk-TASSER shows a significant improvement over TASSER for modeling hard targets where no good template can be identified. We also tested chunk-TASSER on 21 medium/hard targets <200 amino-acids-long from CASP7. Chunk-TASSER is approximately 11% (10%) better than TASSER for the total TM-score of the first (best of top five) models. Chunk-TASSER is fully automated and can be used in proteome scale protein structure prediction.
Collapse
Affiliation(s)
- Hongyi Zhou
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, GA, USA
| | | |
Collapse
|
34
|
López-Viñas E, Bentebibel A, Gurunathan C, Morillas M, de Arriaga D, Serra D, Asins G, Hegardt FG, Gómez-Puertas P. Definition by functional and structural analysis of two malonyl-CoA sites in carnitine palmitoyltransferase 1A. J Biol Chem 2007; 282:18212-18224. [PMID: 17452323 DOI: 10.1074/jbc.m700885200] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Carnitine palmitoyltransferase 1 (CPT1) catalyzes the conversion of palmitoyl-CoA to palmitoylcarnitine in the presence of l-carnitine, thus facilitating the entry of fatty acids to mitochondria, in a process that is physiologically inhibited by malonyl-CoA. To examine the mechanism of CPT1 liver isoform (CPT1A) inhibition by malonyl-CoA, we constructed an in silico model of both its NH2- and COOH-terminal domains. Two malonyl-CoA binding sites were found. One of these, the "CoA site" or "A site," is involved in the interactions between NH2- and COOH-terminal domains and shares the acyl-CoA hemitunnel. The other, the "opposite-to-CoA site" or "O site," is on the opposite side of the enzyme, in the catalytic channel. The two sites share the carnitine-binding locus. To prevent the interaction between NH2- and COOH-terminal regions, we produced CPT1A E26K and K561E mutants. A double mutant E26K/K561E (swap), which was expected to conserve the interaction, was also produced. Inhibition assays showed a 12-fold decrease in the sensitivity (IC50) toward malonyl-CoA for CPT1A E26K and K561E single mutants, whereas swap mutant reverts to wild-type IC50 value. We conclude that structural interaction between both domains is critical for enzyme sensitivity to malonyl-CoA inhibition at the "A site." The location of the "O site" for malonyl-CoA binding was supported by inhibition assays of expressed R243T mutant. The model is also sustained by kinetic experiments that indicated linear mixed type malonyl-CoA inhibition for carnitine. Malonyl-CoA alters the affinity of carnitine, and there appears to be an exponential inverse relation between carnitine Km and malonyl-CoA IC50.
Collapse
Affiliation(s)
- Eduardo López-Viñas
- Centro de Biología Molecular "Severo Ochoa" (Consejo Superior de Investigaciones Científicas-Universidad Autónoma de Madrid), Cantoblanco, E-28049 Madrid, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| | - Assia Bentebibel
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Barcelona, E-08028 Barcelona, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| | - Chandrashekaran Gurunathan
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Barcelona, E-08028 Barcelona, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| | - Montserrat Morillas
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Barcelona, E-08028 Barcelona, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| | - Dolores de Arriaga
- Departamento de Biología Molecular, Universidad de León, E-24071 León, Spain
| | - Dolors Serra
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Barcelona, E-08028 Barcelona, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| | - Guillermina Asins
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Barcelona, E-08028 Barcelona, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| | - Fausto G Hegardt
- Departamento de Bioquímica y Biología Molecular, Facultad de Farmacia, Universidad de Barcelona, E-08028 Barcelona, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain.
| | - Paulino Gómez-Puertas
- Centro de Biología Molecular "Severo Ochoa" (Consejo Superior de Investigaciones Científicas-Universidad Autónoma de Madrid), Cantoblanco, E-28049 Madrid, Spain; CIBER Institute of Fisiopatología de la Obesidad y Nutrición (CB06/03), Instituto de Salud Carlos III, 28049 Madrid, Spain
| |
Collapse
|
35
|
Chovancová E, Kosinski J, Bujnicki JM, Damborský J. Phylogenetic analysis of haloalkane dehalogenases. Proteins 2007; 67:305-16. [PMID: 17295320 DOI: 10.1002/prot.21313] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
Haloalkane dehalogenases (HLDs) are enzymes that catalyze the cleavage of carbon-halogen bonds by a hydrolytic mechanism. Although comparative biochemical analyses have been published, no classification system has been proposed for HLDs, to date, that reconciles their phylogenetic and functional relationships. In the study presented here, we have analyzed all sequences and structures of genuine HLDs and their homologs detectable by database searches. Phylogenetic analyses revealed that the HLD family can be divided into three subfamilies denoted HLD-I, HLD-II, and HLD-III, of which HLD-I and HLD-III are predicted to be sister-groups. A mismatch between the HLD protein tree and the tree of species, as well as the presence of more than one HLD gene in a few genomes, suggest that horizontal gene transfers, and perhaps also multiple gene duplications and losses have been involved in the evolution of this family. Most of the biochemically characterized HLDs are found in the HLD-II subfamily. The dehalogenating activity of two members of the newly identified HLD-III subfamily has only recently been confirmed, in a study motivated by this phylogenetic analysis. A novel type of the catalytic pentad (Asp-His-Asp+Asn-Trp) was predicted for members of the HLD-III subfamily. Calculation of the evolutionary rates and lineage-specific innovations revealed a common conserved core as well as a set of residues that characterizes each HLD subfamily. The N-terminal part of the cap domain is one of the most variable regions within the whole family as well as within individual subfamilies, and serves as a preferential site for the location of relatively long insertions. The highest variability of discrete sites was observed among residues that are structural components of the access channels. Mutations at these sites modify the anatomy of the channels, which are important for the exchange of ligands between the buried active site and the bulk solvent, thus creating a structural basis for the molecular evolution of new substrate specificities. Our analysis sheds light on the evolutionary history of HLDs and provides a structural framework for designing enzymes with new specificities.
Collapse
Affiliation(s)
- Eva Chovancová
- Loschmidt Laboratories, Faculty of Science, Masaryk University, Brno, Czech Republic
| | | | | | | |
Collapse
|
36
|
Zhu J, Xie L, Honig B. Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge-based potentials, and clustering. Proteins 2006; 65:463-79. [PMID: 16927337 DOI: 10.1002/prot.21085] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
In this article, we present an iterative, modular optimization (IMO) protocol for the local structure refinement of protein segments containing secondary structure elements (SSEs). The protocol is based on three modules: a torsion-space local sampling algorithm, a knowledge-based potential, and a conformational clustering algorithm. Alternative methods are tested for each module in the protocol. For each segment, random initial conformations were constructed by perturbing the native dihedral angles of loops (and SSEs) of the segment to be refined while keeping the protein body fixed. Two refinement procedures based on molecular mechanics force fields - using either energy minimization or molecular dynamics - were also tested but were found to be less successful than the IMO protocol. We found that DFIRE is a particularly effective knowledge-based potential and that clustering algorithms that are biased by the DFIRE energies improve the overall results. Results were further improved by adding an energy minimization step to the conformations generated with the IMO procedure, suggesting that hybrid strategies that combine both knowledge-based and physical effective energy functions may prove to be particularly effective in future applications.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, New York 10032, USA
| | | | | |
Collapse
|
37
|
Tan YH, Huang H, Kihara D. Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences. Proteins 2006; 64:587-600. [PMID: 16799934 DOI: 10.1002/prot.21020] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Aligning distantly related protein sequences is a long-standing problem in bioinformatics, and a key for successful protein structure prediction. Its importance is increasing recently in the context of structural genomics projects because more and more experimentally solved structures are available as templates for protein structure modeling. Toward this end, recent structure prediction methods employ profile-profile alignments, and various ways of aligning two profiles have been developed. More fundamentally, a better amino acid similarity matrix can improve a profile itself; thereby resulting in more accurate profile-profile alignments. Here we have developed novel amino acid similarity matrices from knowledge-based amino acid contact potentials. Contact potentials are used because the contact propensity to the other amino acids would be one of the most conserved features of each position of a protein structure. The derived amino acid similarity matrices are tested on benchmark alignments at three different levels, namely, the family, the superfamily, and the fold level. Compared to BLOSUM45 and the other existing matrices, the contact potential-based matrices perform comparably in the family level alignments, but clearly outperform in the fold level alignments. The contact potential-based matrices perform even better when suboptimal alignments are considered. Comparing the matrices themselves with each other revealed that the contact potential-based matrices are very different from BLOSUM45 and the other matrices, indicating that they are located in a different basin in the amino acid similarity matrix space.
Collapse
Affiliation(s)
- Yen Hock Tan
- Department of Computer Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA.
| | | | | |
Collapse
|
38
|
Chivian D, Baker D. Homology modeling using parametric alignment ensemble generation with consensus and energy-based model selection. Nucleic Acids Res 2006; 34:e112. [PMID: 16971460 PMCID: PMC1635247 DOI: 10.1093/nar/gkl480] [Citation(s) in RCA: 89] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The accuracy of a homology model based on the structure of a distant relative or other topologically equivalent protein is primarily limited by the quality of the alignment. Here we describe a systematic approach for sequence-to-structure alignment, called ‘K*Sync’, in which alignments are generated by dynamic programming using a scoring function that combines information on many protein features, including a novel measure of how obligate a sequence region is to the protein fold. By systematically varying the weights on the different features that contribute to the alignment score, we generate very large ensembles of diverse alignments, each optimal under a particular constellation of weights. We investigate a variety of approaches to select the best models from the ensemble, including consensus of the alignments, a hydrophobic burial measure, low- and high-resolution energy functions, and combinations of these evaluation methods. The effect on model quality and selection resulting from loop modeling and backbone optimization is also studied. The performance of the method on a benchmark set is reported and shows the approach to be effective at both generating and selecting accurate alignments. The method serves as the foundation of the homology modeling module in the Robetta server.
Collapse
Affiliation(s)
- Dylan Chivian
- Department of Biochemistry, University of WashingtonSeattle, WA, USA
| | - David Baker
- Department of Biochemistry, University of WashingtonSeattle, WA, USA
- Howard Hughes Medical Institute, SeattleWA, USA
- To whom correspondence should be addressed at Department of Biochemistry and HHMI, University of Washington, Box 357350, Seattle, WA 98195, USA. Tel: +1 206 543 1295; Fax: +1 206 685 1792;
| |
Collapse
|
39
|
Pons T, González B, Ceciliani F, Galizzi A. FlgM anti-sigma factors: identification of novel members of the family, evolutionary analysis, homology modeling, and analysis of sequence-structure-function relationships. J Mol Model 2006; 12:973-83. [PMID: 16673084 DOI: 10.1007/s00894-005-0096-5] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2005] [Accepted: 12/02/2005] [Indexed: 10/24/2022]
Abstract
FlgM proteins, also known as Anti-sigma-28 factor (sigma28), are negative regulators of flagellin synthesis. Recently, a three-dimensional structure of the Aquifex aeolicus sigma28/FlgM complex (PDB code: 1rp3) was determined by X-ray crystallography at 2.3 A resolution. Furthermore, experimental data on bacterial FlgM, including site-directed mutagenesis and structural characterization by NMR are also available. However, an interpretation of the sequence-structure-function relationships combining X-ray and NMR data with the evolutionary information extracted from the increasing number of FlgM-related sequences annotated in databases is not available. In the present study, we combined database sequence searches and sequence-analysis tools to update the multiple sequence alignment of a previously characterized cluster of orthologs (COG2747) and the PFAM classification of protein domains (PF04316) for the FlgM family. A phylogenetic analysis of 77 protein sequences revealed the presence of at least three major sequence clades within the FlgM family. Besides, we predicted functional residues using a SequenceSpace method. We also generated homology models for Bacillus subtilis and Salmonella typhimurium FlgM proteins, for which sequence-structure-function relationship data are available, and used the docking program ClusPro to hypothesize about the dimer association between FlgM proteins. In conclusion, the analysis presented in this work will be useful in designing new experiments to understand better protein-protein interactions between FglM, sigma factors, and putative molecules from the flagellar export apparatus. Electronic Supplementary Material is available in the online version of this article at http://link.springer.de/
Collapse
Affiliation(s)
- T Pons
- Centro de Ingeniería Genética y Biotecnología, Havana, 10600, Cuba.
| | | | | | | |
Collapse
|
40
|
Qiu J, Elber R. SSALN: an alignment algorithm using structure-dependent substitution matrices and gap penalties learned from structurally aligned protein pairs. Proteins 2006; 62:881-91. [PMID: 16385554 DOI: 10.1002/prot.20854] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In template-based modeling of protein structures, the generation of the alignment between the target and the template is a critical step that significantly affects the accuracy of the final model. This paper proposes an alignment algorithm SSALN that learns substitution matrices and position-specific gap penalties from a database of structurally aligned protein pairs. In addition to the amino acid sequence information, secondary structure and solvent accessibility information of a position are used to derive substitution scores and position-specific gap penalties. In a test set of CASP5 targets, SSALN outperforms sequence alignment methods such as a Smith-Waterman algorithm with BLOSUM50 and PSI_BLAST. SSALN also generates better alignments than PSI_BLAST in the CASP6 test set. LOOPP server prediction based on an SSALN alignment is ranked the best for target T0280_1 in CASP6. SSALN is also compared with several threading methods and sequence alignment methods on the ProSup benchmark. SSALN has the highest alignment accuracy among the methods compared. On the Fischer's benchmark, SSALN performs better than CLUSTALW and GenTHREADER, and generates more alignments with accuracy >50%, >60% or >70% than FUGUE, but fewer alignments with accuracy >80% than FUGUE. All the supplemental materials can be found at http://www.cs.cornell.edu/ approximately jianq/research.htm.
Collapse
Affiliation(s)
- Jian Qiu
- Department of Computer Science, Cornell University, Ithaca, New York 14853, USA
| | | |
Collapse
|
41
|
Skowronek KJ, Kosinski J, Bujnicki JM. Theoretical model of restriction endonuclease HpaI in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis. Proteins 2006; 63:1059-68. [PMID: 16498623 DOI: 10.1002/prot.20920] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Type II restriction enzymes are commercially important deoxyribonucleases and very attractive targets for protein engineering of new specificities. At the same time they are a very challenging test bed for protein structure prediction methods. Typically, enzymes that recognize different sequences show little or no amino acid sequence similarity to each other and to other proteins. Based on crystallographic analyses that revealed the same PD-(D/E)XK fold for more than a dozen case studies, they were nevertheless considered to be related until the combination of bioinformatics and mutational analyses has demonstrated that some of these proteins belong to other, unrelated folds PLD, HNH, and GIY-YIG. As a part of a large-scale project aiming at identification of a three-dimensional fold for all type II REases with known sequences (currently approximately 1000 proteins), we carried out preliminary structure prediction and selected candidates for experimental validation. Here, we present the analysis of HpaI REase, an ORFan with no detectable homologs, for which we detected a structural template by protein fold recognition, constructed a model using the FRankenstein monster approach and identified a number of residues important for the DNA binding and catalysis. These predictions were confirmed by site-directed mutagenesis and in vitro analysis of the mutant proteins. The experimentally validated model of HpaI will serve as a low-resolution structural platform for evolutionary considerations in the subgroup of blunt-cutting REases with different specificities. The research protocol developed in the course of this work represents a streamlined version of the previously used techniques and can be used in a high-throughput fashion to build and validate models for other enzymes, especially ORFans that exhibit no sequence similarity to any other protein in the database.
Collapse
|
42
|
Abstract
In recent years, there has been significant progress in the ability to predict the three-dimensional structure of proteins from their amino acid sequence. Progress has been due to new methods to extract the growing amount of information in sequence and structure databases and improved computational descriptions of protein energetics. This review summarizes recent advances in these areas and describes a number of novel biological applications made possible by structure prediction. Despite remaining challenges, protein structure prediction is becoming an extremely useful tool in understanding phenomena in modern molecular and cell biology.
Collapse
Affiliation(s)
- Donald Petrey
- Howard Hughes Medical Institute, Department of Biochemistry and Molecular Biophysics, Center for Computational Biology and Bioinformatics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
43
|
Abstract
With currently available sequence data, it is feasible to conduct extensive comparisons among large sets of protein sequences. It is still a much more challenging task to partition the protein space into structurally and functionally related families solely based on sequence comparisons. The ProtoNet system automatically generates a treelike classification of the whole protein space. It stands to reason that this classification reflects evolutionary relationships, both close and remote. In this article, we examine this hypothesis. We present a semiautomatic procedure that singles out certain inner nodes in the ProtoNet tree that should ideally correspond to structurally and functionally defined protein families. We compare the performance of this method against several expert systems. Some of the competing methods incorporate additional extraneous information on protein structure or on enzymatic activities. The ProtoNet-based method performs at least as well as any of the methods with which it was compared. This article illustrates the ProtoNet-based method on several evolutionarily diverse families. Using this new method, an evolutionary divergence scheme can be proposed for a large number of structural and functional related superfamilies.
Collapse
Affiliation(s)
- Ori Shachar
- School of Computer Science and Engineering, Hebrew University, Jerusalem, Israel
| | | |
Collapse
|
44
|
Bitto E, Bingman CA, Robinson H, Allard STM, Wesenberg GE, Phillips GN. The structure at 2.5 A resolution of human basophilic leukemia-expressed protein BLES03. Acta Crystallogr Sect F Struct Biol Cryst Commun 2005; 61:812-7. [PMID: 16511166 PMCID: PMC1978119 DOI: 10.1107/s1744309105023845] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2005] [Accepted: 07/25/2005] [Indexed: 11/10/2022]
Abstract
The crystal structure of the human basophilic leukemia-expressed protein (BLES03, p5326, Hs.433573) was determined by single-wavelength anomalous diffraction and refined to an R factor of 18.8% (Rfree = 24.5%) at 2.5 A resolution. BLES03 shows no detectable sequence similarity to any functionally characterized proteins using state-of-the-art sequence-comparison tools. The structure of BLES03 adopts a fold similar to that of eukaryotic transcription initiation factor 4E (eIF4E), a protein involved in the recognition of the cap structure of eukaryotic mRNA. In addition to fold similarity, the electrostatic surface potentials of BLES03 and eIF4E show a clear conservation of basic and acidic patches. In the crystal lattice, the acidic amino-terminal helices of BLES03 monomers are bound within the basic cavity of symmetry-related monomers in a manner analogous to the binding of mRNA by eIF4E. Interestingly, the gene locus encoding BLES03 is located between genes encoding the proteins DRAP1 and FOSL1, both of which are involved in transcription initiation. It is hypothesized that BLES03 itself may be involved in a biochemical process that requires recognition of nucleic acids.
Collapse
Affiliation(s)
- Eduard Bitto
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Craig A. Bingman
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | | | - Simon T. M. Allard
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Gary E. Wesenberg
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - George N. Phillips
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| |
Collapse
|
45
|
Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res 2005; 33:W284-8. [PMID: 15980471 PMCID: PMC1160179 DOI: 10.1093/nar/gki418] [Citation(s) in RCA: 456] [Impact Index Per Article: 24.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022] Open
Abstract
The FFAS03 server provides a web interface to the third generation of the profile–profile alignment and fold-recognition algorithm of fold and function assignment system (FFAS) [L. Rychlewski, L. Jaroszewski, W. Li and A. Godzik (2000), Protein Sci., 9, 232–241]. Profile–profile algorithms use information present in sequences of homologous proteins to amplify the patterns defining the family. As a result, they enable detection of remote homologies beyond the reach of other methods. FFAS, initially developed in 2000, is consistently one of the best ranked fold prediction methods in the CAFASP and LiveBench competitions. It is also used by several fold-recognition consensus methods and meta-servers. The FFAS03 server accepts a user supplied protein sequence and automatically generates a profile, which is then compared with several sets of sequence profiles of proteins from PDB, COG, PFAM and SCOP. The profile databases used by the server are automatically updated with the latest structural and sequence information. The server provides access to the alignment analysis, multiple alignment, and comparative modeling tools. Access to the server is open for both academic and commercial researchers. The FFAS03 server is available at .
Collapse
Affiliation(s)
| | | | | | | | - Adam Godzik
- To whom correspondence should be addressed. Tel: +1 858 646 3168; Fax: +1 858 713 9925;
| |
Collapse
|
46
|
Shah PK, Aloy P, Bork P, Russell RB. Structural similarity to bridge sequence space: finding new families on the bridges. Protein Sci 2005; 14:1305-14. [PMID: 15840833 PMCID: PMC2253280 DOI: 10.1110/ps.041187405] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
Abstract
Structures for protein domains have increased rapidly in recent years owing to advances in structural biology and structural genomics projects. New structures are often similar to those solved previously, and such similarities can give insights into function by linking poorly understood families to those that are better characterized. They also allow the possibility of combing information to find still more proteins adopting a similar structure and sometimes a similar function, and to reprioritize families in structural genomics pipelines. We explore this possibility here by preparing merged profiles for pairs of structurally similar, but not necessarily sequence-similar, domains within the SMART and Pfam database by way of the Structural Classification of Proteins (SCOP). We show that such profiles are often able to successfully identify further members of the same superfamily and thus can be used to increase the sensitivity of database searching methods like HMMer and PSI-BLAST. We perform detailed benchmarks using the SMART and Pfam databases with four complete genomes frequently used as annotation benchmarks. We quantify the associated increase in structural information in Swissprot and discuss examples illustrating the applicability of this approach to understand functional and evolutionary relationships between protein families.
Collapse
|
47
|
Johnston CR, Shields DC. A sequence sub-sampling algorithm increases the power to detect distant homologues. Nucleic Acids Res 2005; 33:3772-8. [PMID: 16006623 PMCID: PMC1174907 DOI: 10.1093/nar/gki687] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.
Collapse
Affiliation(s)
- Catrióna R Johnston
- Department of Clinical Pharmacology, Bioinformatics Group, Royal College of Surgeons in Ireland, 123 St Stephens Green, Dublin 2, Ireland.
| | | |
Collapse
|
48
|
Bitto E, Bingman CA, Allard STM, Wesenberg GE, Phillips GN. The structure at 1.7 A resolution of the protein product of the At2g17340 gene from Arabidopsis thaliana. Acta Crystallogr Sect F Struct Biol Cryst Commun 2005; 61:630-5. [PMID: 16511115 PMCID: PMC1952457 DOI: 10.1107/s1744309105017690] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2005] [Accepted: 06/03/2005] [Indexed: 04/10/2023]
Abstract
The crystal structure of the At2g17340 protein from A. thaliana was determined by the multiple-wavelength anomalous diffraction method and was refined to an R factor of 16.9% (Rfree = 22.1%) at 1.7 A resolution. At2g17340 is a member of the Pfam01937.11 protein family and its structure provides the first insight into the structural organization of this family. A number of fully and highly conserved residues defined by multiple sequence alignment of members of the Pfam01937.11 family were mapped onto the structure of At2g17340. The fully conserved residues are involved in the coordination of a metal ion and in the stabilization of loops surrounding the metal site. Several additional highly conserved residues also map into the vicinity of the metal-binding site, while others are clearly involved in stabilizing the hydrophobic core of the protein. The structure of At2g17340 represents a new fold in protein conformational space.
Collapse
Affiliation(s)
- Eduard Bitto
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Craig A. Bingman
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Simon T. M. Allard
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Gary E. Wesenberg
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - George N. Phillips
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| |
Collapse
|
49
|
Allard STM, Bingman CA, Johnson KA, Wesenberg GE, Bitto E, Jeon WB, Phillips GN. Structure at 1.6 A resolution of the protein from gene locus At3g22680 from Arabidopsis thaliana. Acta Crystallogr Sect F Struct Biol Cryst Commun 2005; 61:647-50. [PMID: 16511118 PMCID: PMC1952470 DOI: 10.1107/s1744309105019743] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2005] [Accepted: 06/22/2005] [Indexed: 11/10/2022]
Abstract
The gene product of At3g22680 from Arabidopsis thaliana codes for a protein of unknown function. The crystal structure of the At3g22680 gene product was determined by multiple-wavelength anomalous diffraction and refined to an R factor of 16.0% (Rfree = 18.4%) at 1.60 A resolution. The refined structure shows one monomer in the asymmetric unit, with one molecule of the non-denaturing detergent CHAPS {3-[(3-cholamidopropyl)dimethylammonio]-1-propane sulfonate} tightly bound. Protein At3g22680 shows no structural homology to any other known proteins and represents a new fold in protein conformation space.
Collapse
Affiliation(s)
- Simon T M Allard
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | | | | | | | | | | | | |
Collapse
|
50
|
Pettitt CS, McGuffin LJ, Jones DT. Improving sequence-based fold recognition by using 3D model quality assessment. Bioinformatics 2005; 21:3509-15. [PMID: 15955780 DOI: 10.1093/bioinformatics/bti540] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The ability of a simple method (MODCHECK) to determine the sequence-structure compatibility of a set of structural models generated by fold recognition is tested in a thorough benchmark analysis. Four Model Quality Assessment Programs (MQAPs) were tested on 188 targets from the latest LiveBench-9 automated structure evaluation experiment. We systematically test and evaluate whether the MQAP methods can successfully detect native-like models. RESULTS We show that compared with the other three methods tested MODCHECK is the most reliable method for consistently performing the best top model selection and for ranking the models. In addition, we show that the choice of model similarity score used to assess a model's similarity to the experimental structure can influence the overall performance of these tools. Although these MQAP methods fail to improve the model selection performance for methods that already incorporate protein three dimension (3D) structural information, an improvement is observed for methods that are purely sequence-based, including the best profile-profile methods. This suggests that even the best sequence-based fold recognition methods can still be improved by taking into account the 3D structural information. CONTACT d.jones@cs.ucl.ac.uk
Collapse
Affiliation(s)
- Chris S Pettitt
- Bioinformatics Unit, Department of Computer Science, University College London, Gower Street, London WC1E 6BT, UK
| | | | | |
Collapse
|