51
|
Bitto E, Bingman CA, Allard STM, Wesenberg GE, Aceti DJ, Wrobel RL, Frederick RO, Sreenath H, Vojtik FC, Jeon WB, Newman CS, Primm J, Sussman MR, Fox BG, Markley JL, Phillips GN. The structure at 2.4 A resolution of the protein from gene locus At3g21360, a putative Fe(II)/2-oxoglutarate-dependent enzyme from Arabidopsis thaliana. Acta Crystallogr Sect F Struct Biol Cryst Commun 2005; 61:469-72. [PMID: 16511070 PMCID: PMC1952295 DOI: 10.1107/s1744309105011565] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2005] [Accepted: 04/13/2005] [Indexed: 11/10/2022]
Abstract
The crystal structure of the gene product of At3g21360 from Arabidopsis thaliana was determined by the single-wavelength anomalous dispersion method and refined to an R factor of 19.3% (Rfree = 24.1%) at 2.4 A resolution. The crystal structure includes two monomers in the asymmetric unit that differ in the conformation of a flexible domain that spans residues 178-230. The crystal structure confirmed that At3g21360 encodes a protein belonging to the clavaminate synthase-like superfamily of iron(II) and 2-oxoglutarate-dependent enzymes. The metal-binding site was defined and is similar to the iron(II) binding sites found in other members of the superfamily.
Collapse
Affiliation(s)
- Eduard Bitto
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Craig A. Bingman
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Simon T. M. Allard
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Gary E. Wesenberg
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - David J. Aceti
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Russell L. Wrobel
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Ronnie O. Frederick
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Hassan Sreenath
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Frank C. Vojtik
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Won Bae Jeon
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Craig S. Newman
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - John Primm
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Michael R. Sussman
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - Brian G. Fox
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - John L. Markley
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| | - George N. Phillips
- Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, USA
| |
Collapse
|
52
|
Chmiel AA, Radlinska M, Pawlak SD, Krowarsch D, Bujnicki JM, Skowronek KJ. A theoretical model of restriction endonuclease NlaIV in complex with DNA, predicted by fold recognition and validated by site-directed mutagenesis and circular dichroism spectroscopy. Protein Eng Des Sel 2005; 18:181-9. [PMID: 15849215 DOI: 10.1093/protein/gzi019] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023] Open
Abstract
Restriction enzymes (REases) are commercial reagents commonly used in DNA manipulations and mapping. They are regarded as very attractive models for studying protein-DNA interactions and valuable targets for protein engineering. Their amino acid sequences usually show no similarities to other proteins, with rare exceptions of other REases that recognize identical or very similar sequences. Hence, they are extremely hard targets for structure prediction and modeling. NlaIV is a Type II REase, which recognizes the interrupted palindromic sequence GGNNCC (where N indicates any base) and cleaves it in the middle, leaving blunt ends. NlaIV shows no sequence similarity to other proteins and virtually nothing is known about its sequence-structure-function relationships. Using protein fold recognition, we identified a remote relationship between NlaIV and EcoRV, an extensively studied REase, which recognizes the GATATC sequence and whose crystal structure has been determined. Using the 'FRankenstein's monster' approach we constructed a comparative model of NlaIV based on the EcoRV template and used it to predict the catalytic and DNA-binding residues. The model was validated by site-directed mutagenesis and analysis of the activity of the mutants in vivo and in vitro as well as structural characterization of the wild-type enzyme and two mutants by circular dichroism spectroscopy. The structural model of the NlaIV-DNA complex suggests regions of the protein sequence that may interact with the 'non-specific' bases of the target and thus it provides insight into the evolution of sequence specificity in restriction enzymes and may help engineer REases with novel specificities. Before this analysis was carried out, neither the three-dimensional fold of NlaIV, its evolutionary relationships or its catalytic or DNA-binding residues were known. Hence our analysis may be regarded as a paradigm for studies aiming at reducing 'white spaces' on the evolutionary landscape of sequence-function relationships by combining bioinformatics with simple experimental assays.
Collapse
Affiliation(s)
- Agnieszka A Chmiel
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, ul. ks. Trojdena 4, 02-109 Warsaw, Poland
| | | | | | | | | | | |
Collapse
|
53
|
Abstract
UNLABELLED Meta-DP, a domain prediction meta-server provides a simple interface to predict domains in a given protein sequence using a number of domain prediction methods. The Meta-DP is a convenient resource because through accessing a single site, users automatically obtain the results of the various domain prediction methods along with a consensus prediction. The Meta-DP is currently coupled to 10 domain prediction servers and can be extended to include any number of methods. Meta-DP can thus become a centralized repository of available methods. Meta-DP was also used to evaluate the performance of 13 domain prediction methods in the context of CAFASP-DP. AVAILABILITY The Meta-DP server is freely available at http://meta-dp.bioinformatics.buffalo.edu and the CAFASP-DP evaluation results are available at http://cafasp4.bioinformatics.buffalo.edu/dp/update.html CONTACT hkaur@bioinformatics.buffalo.edu SUPPLEMENTARY INFORMATION Available at http://cafasp4.bioinformatics.buffalo.edu/dp/update.html.
Collapse
Affiliation(s)
- Harpreet Kaur Saini
- Center of Excellence in Bioinformatics and Department of Computer Science and Engineering, University at Buffalo, 901 Washington Street, Suite 300, Buffalo, NY 14203, USA.
| | | |
Collapse
|
54
|
Betschinger J, Eisenhaber F, Knoblich JA. Phosphorylation-induced autoinhibition regulates the cytoskeletal protein Lethal (2) giant larvae. Curr Biol 2005; 15:276-82. [PMID: 15694314 DOI: 10.1016/j.cub.2005.01.012] [Citation(s) in RCA: 127] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2004] [Revised: 12/03/2004] [Accepted: 12/06/2004] [Indexed: 11/28/2022]
Abstract
During asymmetric cell division, cell fate determinants localize asymmetrically and segregate into one of the two daughter cells. In Drosophila neuroblasts, the asymmetric localization of cell fate determinants to the basal cell cortex requires aPKC. aPKC localizes to the apical cell cortex and phosphorylates the cytoskeletal protein Lethal (2) giant larvae (Lgl). Upon phosphorylation, Lgl dissociates from the cytoskeleton and becomes inactive. Here, we show that phosphorylation regulates Lgl by allowing an autoinhibitory interaction of the N terminus with the C terminus of the protein. We demonstrate that interaction with the cytoskeleton is mediated by a C-terminal domain while the N terminus is not required. Instead, the N terminus can bind to the C terminus and can compete for binding to the cytoskeleton. Interaction between the N- and C-terminal domains requires phosphorylation of Lgl by aPKC. Our results suggest that unphosphorylated, active Lgl exists in an open conformation that interacts with the cytoskeleton while phosphorylation changes the protein to an autoinhibited state.
Collapse
Affiliation(s)
- Joerg Betschinger
- Institute of Molecular Biotechnology of the Austrian Academy of Sciences, Dr. Bohr Gasse 3-5, 1030 Vienna, Austria
| | | | | |
Collapse
|
55
|
Peti W, Etezady-Esfarjani T, Herrmann T, Klock HE, Lesley SA, Wüthrich K. NMR for structural proteomics of Thermotoga maritima: screening and structure determination. ACTA ACUST UNITED AC 2005; 5:205-15. [PMID: 15263836 DOI: 10.1023/b:jsfg.0000029055.84242.9f] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
This paper describes the NMR screening of 141 small (<15 kDa) recombinant Thermotoga maritima proteins for globular folding. The experimental data shows that approximately 25% of the screened proteins are folded under our screening conditions, which makes this procedure an important step for selecting those proteins that are suitable for structure determination. A comparison of screening based either on 1D 1H NMR with unlabeled proteins or on 2D [1H,15N]-COSY with uniformly 15N-labeled proteins is presented, and a comprehensive analysis of the 1D 1H NMR screening data is described. As an illustration of the utility of these methods to structural proteomics, the NMR structure determination of TM1492 (ribosomal protein L29) is presented. This 66-residue protein consists of a N-terminal 3(10)-helix and two long alpha-helices connected by a tight turn centered about glycine 35, where conserved leucine and isoleucine residues in the two alpha-helices form a small hydrophobic core.
Collapse
Affiliation(s)
- Wolfgang Peti
- The Scripps Research Institute, Department of Molecular Biology and Joint Center of Structural Genomics, 10550 North Torrey Pines Road, La Jolla, CA 92037, USA.
| | | | | | | | | | | |
Collapse
|
56
|
Sanmartín M, Jaroszewski L, Raikhel NV, Rojo E. Caspases. Regulating death since the origin of life. PLANT PHYSIOLOGY 2005; 137:841-7. [PMID: 15761210 PMCID: PMC1065385 DOI: 10.1104/pp.104.058552] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/17/2004] [Revised: 12/23/2004] [Accepted: 12/23/2004] [Indexed: 05/18/2023]
Affiliation(s)
- Maite Sanmartín
- Departamento de Genética Molecular de Plantas, Centro Nacional de Biotecnología, CSIC, E-28049 Madrid, Spain
| | | | | | | |
Collapse
|
57
|
Arndt JW, Gu J, Jaroszewski L, Schwarzenbacher R, Hanson MA, Lebeda FJ, Stevens RC. The Structure of the Neurotoxin-associated Protein HA33/A from Clostridium botulinum Suggests a Reoccurring β-Trefoil Fold in the Progenitor Toxin Complex. J Mol Biol 2005; 346:1083-93. [PMID: 15701519 DOI: 10.1016/j.jmb.2004.12.039] [Citation(s) in RCA: 54] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2004] [Revised: 12/15/2004] [Accepted: 12/16/2004] [Indexed: 11/18/2022]
Abstract
The hemagglutinating protein HA33 from Clostridium botulinum is associated with the large botulinum neurotoxin secreted complexes and is critical in toxin protection, internalization, and possibly activation. We report the crystal structure of serotype A HA33 (HA33/A) at 1.5 A resolution that contains a unique domain organization and a carbohydrate recognition site. In addition, sequence alignments of the other toxin complex components, including the neurotoxin BoNT/A, hemagglutinating protein HA17/A, and non-toxic non-hemagglutinating protein NTNHA/A, suggests that most of the toxin complex consists of a reoccurring beta-trefoil fold.
Collapse
Affiliation(s)
- Joseph W Arndt
- Department of Molecular Biology, The Scripps Research Institute, 10550 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | | | | | | | | | |
Collapse
|
58
|
Peti W, Herrmann T, Zagnitko O, Grzechnik SK, Wüthrich K. NMR structure of the conserved hypothetical protein TM0979 from Thermotoga maritima. Proteins 2005; 59:387-90. [PMID: 15723348 DOI: 10.1002/prot.20352] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Affiliation(s)
- Wolfgang Peti
- Department of Molecular Biology and the Joint Center of Structural Genomics, The Scripps Research Institute, La Jolla, California 92037, USA.
| | | | | | | | | |
Collapse
|
59
|
Simossis VA, Kleinjung J, Heringa J. Homology-extended sequence alignment. Nucleic Acids Res 2005; 33:816-24. [PMID: 15699183 PMCID: PMC549400 DOI: 10.1093/nar/gki233] [Citation(s) in RCA: 96] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2004] [Revised: 01/05/2005] [Accepted: 01/20/2005] [Indexed: 11/15/2022] Open
Abstract
We present a profile-profile multiple alignment strategy that uses database searching to collect homologues for each sequence in a given set, in order to enrich their available evolutionary information for the alignment. For each of the alignment sequences, the putative homologous sequences that score above a pre-defined threshold are incorporated into a position-specific pre-alignment profile. The enriched position-specific profile is used for standard progressive alignment, thereby more accurately describing the characteristic features of the given sequence set. We show that owing to the incorporation of the pre-alignment information into a standard progressive multiple alignment routine, the alignment quality between distant sequences increases significantly and outperforms state-of-the-art methods, such as T-COFFEE and MUSCLE. We also show that although entirely sequence-based, our novel strategy is better at aligning distant sequences when compared with a recent contact-based alignment method. Therefore, our pre-alignment profile strategy should be advantageous for applications that rely on high alignment accuracy such as local structure prediction, comparative modelling and threading.
Collapse
Affiliation(s)
- V. A. Simossis
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| | - J. Kleinjung
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| | - J. Heringa
- Bioinformatics Section, Faculty of Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
- Centre for Integrative Bioinformatics VU (IBIVU), Faculty of Sciences and Faculty of Earth and Life Sciences, Vrije UniversiteitDe Boelelaan 1081A, 1081 HV, Amsterdam, The Netherlands
| |
Collapse
|
60
|
Pei J, Grishin NV. Combining evolutionary and structural information for local protein structure prediction. Proteins 2004; 56:782-94. [PMID: 15281130 DOI: 10.1002/prot.20158] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
We study the effects of various factors in representing and combining evolutionary and structural information for local protein structural prediction based on fragment selection. We prepare databases of fragments from a set of non-redundant protein domains. For each fragment, evolutionary information is derived from homologous sequences and represented as estimated effective counts and frequencies of amino acids (evolutionary frequencies) at each position. Position-specific amino acid preferences called structural frequencies are derived from statistical analysis of discrete local structural environments in database structures. Our method for local structure prediction is based on ranking and selecting database fragments that are most similar to a target fragment. Using secondary structure type as a local structural property, we test our method in a number of settings. The major findings are: (1) the COMPASS-type scoring function for fragment similarity comparison gives better prediction accuracy than three other tested scoring functions for profile-profile comparison. We show that the COMPASS-type scoring function can be derived both in the probabilistic framework and in the framework of statistical potentials. (2) Using the evolutionary frequencies of database fragments gives better prediction accuracy than using structural frequencies. (3) Finer definition of local environments, such as including more side-chain solvent accessibility classes and considering the backbone conformations of neighboring residues, gives increasingly better prediction accuracy using structural frequencies. (4) Combining evolutionary and structural frequencies of database fragments, either in a linear fashion or using a pseudocount mixture formula, results in improvement of prediction accuracy. Combination at the log-odds score level is not as effective as combination at the frequency level. This suggests that there might be better ways of combining sequence and structural information than the commonly used linear combination of log-odds scores. Our method of fragment selection and frequency combination gives reasonable results of secondary structure prediction tested on 56 CASP5 targets (average SOV score 0.77), suggesting that it is a valid method for local protein structure prediction. Mixture of predicted structural frequencies and evolutionary frequencies improve the quality of local profile-to-profile alignment by COMPASS.
Collapse
Affiliation(s)
- Jimin Pei
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | |
Collapse
|
61
|
Abstract
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA.
| | | | | |
Collapse
|
62
|
Torda AE, Procter JB, Huber T. Wurst: a protein threading server with a structural scoring function, sequence profiles and optimized substitution matrices. Nucleic Acids Res 2004; 32:W532-5. [PMID: 15215443 PMCID: PMC441495 DOI: 10.1093/nar/gkh357] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Wurst is a protein threading program with an emphasis on high quality sequence to structure alignments (http://www.zbh.uni-hamburg.de/wurst). Submitted sequences are aligned to each of about 3000 templates with a conventional dynamic programming algorithm, but using a score function with sophisticated structure and sequence terms. The structure terms are a log-odds probability of sequence to structure fragment compatibility, obtained from a Bayesian classification procedure. A simplex optimization was used to optimize the sequence-based terms for the goal of alignment and model quality and to balance the sequence and structural contributions against each other. Both sequence and structural terms operate with sequence profiles.
Collapse
Affiliation(s)
- Andrew E Torda
- University of Hamburg, Zentrum für Bioinformatik, Bundesstrasse 43, D-20146 Hamburg, Germany
| | | | | |
Collapse
|
63
|
Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 2004; 32:W526-31. [PMID: 15215442 PMCID: PMC441606 DOI: 10.1093/nar/gkh468] [Citation(s) in RCA: 1372] [Impact Index Per Article: 68.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein-protein interactions using computational interface alanine scanning. The Rosetta protein design and protein-protein docking methodologies will soon be available through the server as well.
Collapse
Affiliation(s)
- David E Kim
- Structural Genomics of Pathogenic Protozoa, Department of Biochemistry, University of Washington, Seattle WA 98195, USA
| | | | | |
Collapse
|
64
|
Plewczynski D, Rychlewski L, Ye Y, Jaroszewski L, Godzik A. Integrated web service for improving alignment quality based on segments comparison. BMC Bioinformatics 2004; 5:98. [PMID: 15271224 PMCID: PMC497040 DOI: 10.1186/1471-2105-5-98] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2004] [Accepted: 07/22/2004] [Indexed: 04/30/2023] Open
Abstract
Background Defining blocks forming the global protein structure on the basis of local structural regularity is a very fruitful idea, extensively used in description, and prediction of structure from only sequence information. Over many years the secondary structure elements were used as available building blocks with great success. Specially prepared sets of possible structural motifs can be used to describe similarity between very distant, non-homologous proteins. The reason for utilizing the structural information in the description of proteins is straightforward. Structural comparison is able to detect approximately twice as many distant relationships as sequence comparison at the same error rate. Results Here we provide a new fragment library for Local Structure Segment (LSS) prediction called FRAGlib which is integrated with a previously described segment alignment algorithm SEA. A joined FRAGlib/SEA server provides easy access to both algorithms, allowing a one stop alignment service using a novel approach to protein sequence alignment based on a network matching approach. The FRAGlib used as secondary structure prediction achieves only 73% accuracy in Q3 measure, but when combined with the SEA alignment, it achieves a significant improvement in pairwise sequence alignment quality, as compared to previous SEA implementation and other public alignment algorithms. The FRAGlib algorithm takes ~2 min. to search over FRAGlib database for a typical query protein with 500 residues. The SEA service align two typical proteins within circa ~5 min. All supplementary materials (detailed results of all the benchmarks, the list of test proteins and the whole fragments library) are available for download on-line at . Conclusions The joined FRAGlib/SEA server will be a valuable tool both for molecular biologists working on protein sequence analysis and for bioinformaticians developing computational methods of structure prediction and alignment of proteins.
Collapse
Affiliation(s)
- Dariusz Plewczynski
- Bioinformatics Laboratory, BioInfoBank Institute, Poznan, Poland
- Interdisciplinary Centre for Mathematical and Computational Modeling, University of Warsaw, Poland
| | | | - Yuzhen Ye
- The Burnham Institute, La Jolla, USA
| | - Lukasz Jaroszewski
- Bioinformatics Core JCSG, University of California San Diego, La Jolla, USA
| | - Adam Godzik
- The Burnham Institute, La Jolla, USA
- Bioinformatics Core JCSG, University of California San Diego, La Jolla, USA
| |
Collapse
|
65
|
Abstract
Structural genomics is the idea of covering protein space so that every protein sequence comes within model building distance of a protein of known structure. Unfortunately, reproducing the structural alignment of distantly related proteins is a difficult challenge to existing sequence alignment and motif search software. We have developed a new transitive alignment algorithm (MaxFlow), which generates accurate alignments between proteins deep in the twilight zone of sequence similarity, below 20% sequence identity. In particular, MaxFlow reliably identifies conserved core motifs between proteins which are only indirect PSI-Blast neighbours. Based on MaxFlow alignments, useful 3D models can be generated for all members of a superfamily from as few as a single structural template--despite hundreds of representatives at 40% sequence identity level and patchy detection of homology by PSI-Blast. We propose novel strategies for target prioritization using MaxFlow scores to predict the optimal templates in a superfamily. Our results support an increase in the granularity of covering protein space that has potentially enormous economic implications for planning the transition to the full production phase of structural genomics.
Collapse
Affiliation(s)
- A Heger
- Institute of Biotechnology, PO Box 56, 00014 University of Helsinki, Finland
| | | |
Collapse
|
66
|
Bonneau R, Baliga NS, Deutsch EW, Shannon P, Hood L. Comprehensive de novo structure prediction in a systems-biology context for the archaea Halobacterium sp. NRC-1. Genome Biol 2004; 5:R52. [PMID: 15287974 PMCID: PMC507877 DOI: 10.1186/gb-2004-5-8-r52] [Citation(s) in RCA: 36] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2004] [Revised: 03/07/2004] [Accepted: 06/01/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Large fractions of all fully sequenced genomes code for proteins of unknown function. Annotating these proteins of unknown function remains a critical bottleneck for systems biology and is crucial to understanding the biological relevance of genome-wide changes in mRNA and protein expression, protein-protein and protein-DNA interactions. The work reported here demonstrates that de novo structure prediction is now a viable option for providing general function information for many proteins of unknown function. RESULTS We have used Rosetta de novo structure prediction to predict three-dimensional structures for 1,185 proteins and protein domains (<150 residues in length) found in Halobacterium NRC-1, a widely studied halophilic archaeon. Predicted structures were searched against the Protein Data Bank to identify fold similarities and extrapolate putative functions. They were analyzed in the context of a predicted association network composed of several sources of functional associations such as: predicted protein interactions, predicted operons, phylogenetic profile similarity and domain fusion. To illustrate this approach, we highlight three cases where our combined procedure has provided novel insights into our understanding of chemotaxis, possible prophage remnants in Halobacterium NRC-1 and archaeal transcriptional regulators. CONCLUSIONS Simultaneous analysis of the association network, coordinated mRNA level changes in microarray experiments and genome-wide structure prediction has allowed us to glean significant biological insights into the roles of several Halobacterium NRC-1 proteins of previously unknown function, and significantly reduce the number of proteins encoded in the genome of this haloarchaeon for which no annotation is available.
Collapse
Affiliation(s)
| | - Nitin S Baliga
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| | - Eric W Deutsch
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| | - Paul Shannon
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| | - Leroy Hood
- Institute for Systems Biology, Seattle, WA 98103-8904, USA
| |
Collapse
|
67
|
Bjerkan TM, Bender CL, Ertesvåg H, Drabløs F, Fakhr MK, Preston LA, Skjak-Braek G, Valla S. The Pseudomonas syringae Genome Encodes a Combined Mannuronan C-5-epimerase and O-Acetylhydrolase, Which Strongly Enhances the Predicted Gel-forming Properties of Alginates. J Biol Chem 2004; 279:28920-9. [PMID: 15123694 DOI: 10.1074/jbc.m313293200] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Alginates are industrially important, linear copolymers of beta-d-mannuronic acid (M) and its C-5-epimer alpha-l-guluronic acid (G). The G residues originate from a postpolymerization reaction catalyzed by mannuronan C-5-epimerases (MEs), leading to extensive variability in M/G ratios and distribution patterns. Alginates containing long continuous stretches of G residues (G blocks) can form strong gels, a polymer type not found in alginate-producing bacteria belonging to the genus Pseudomonas. Here we show that the Pseudomonas syringae genome encodes a Ca(2+)-dependent ME (PsmE) that efficiently forms such G blocks in vitro. The deduced PsmE protein consists of 1610 amino acids and is a modular enzyme related to the previously characterized family of Azotobacter vinelandii ME (AlgE1-7). A- and R-like modules with sequence similarity to those in the AlgE enzymes are found in PsmE, and the A module of PsmE (PsmEA) was found to be sufficient for epimerization. Interestingly, an R module from AlgE4 stimulated Ps-mEA activity. PsmE contains two regions designated M and RTX, both presumably involved in the binding of Ca(2+). Bacterial alginates are partly acetylated, and such modified residues cannot be epimerized. Based on a detailed computer-assisted analysis and experimental studies another PsmE region, designated N, was found to encode an acetylhydrolase. By the combined action of N and A PsmE was capable of redesigning an extensively acetylated alginate low in G from a non gel-forming to a gel-forming state. Such a property has to our knowledge not been previously reported for an enzyme acting on a polysaccharide.
Collapse
Affiliation(s)
- Tonje M Bjerkan
- Department of Biotechnology, Norwegian University of Science and Technology, N-7491 Trondheim, Norway
| | | | | | | | | | | | | | | |
Collapse
|
68
|
Abstract
The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein-protein interactions using computational interface alanine scanning. The Rosetta protein design and protein-protein docking methodologies will soon be available through the server as well.
Collapse
Affiliation(s)
- David E Kim
- Structural Genomics of Pathogenic Protozoa, Department of Biochemistry, University of Washington, Seattle WA 98195, USA
| | | | | |
Collapse
|
69
|
Sadreyev RI, Baker D, Grishin NV. Profile-profile comparisons by COMPASS predict intricate homologies between protein families. Protein Sci 2004; 12:2262-72. [PMID: 14500884 PMCID: PMC2366929 DOI: 10.1110/ps.03197403] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
Recently we proposed a novel method of alignment–alignment comparison, COMPASS (the tool for COmparison of Multiple Protein Alignments with Assessment of Statistical Significance). Here we present several examples of the relations between PFAM protein families that were detected by COMPASS and that lead to the predictions of presently unresolved protein structures. We discuss relatively straightforward COMPASS predictions that are new and interesting to us, and that would require a substantial time and effort to justify even for a skilled PSI-BLAST user. All of the presented COMPASS hits are independently confirmed by other methods, including the ab initio structure-prediction method ROSETTA. The tertiary structure predictions made by ROSETTA proved to be useful for improving sequence-derived alignments, because they are based on a reasonable folding of the polypeptide chain rather than on the information from sequence databases. The ability of COMPASS to predict new relations within the PFAM database indicates the high sensitivity of COMPASS searches and substantiates its potential value for the discovery of previously unknown similarities between protein families.
Collapse
Affiliation(s)
- Ruslan I Sadreyev
- Howard Hughes Medical Institute and Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, Texas 75390-9050, USA
| | | | | |
Collapse
|
70
|
Ohlson T, Wallner B, Elofsson A. Profile-profile methods provide improved fold-recognition: A study of different profile-profile alignment methods. Proteins 2004; 57:188-97. [PMID: 15326603 DOI: 10.1002/prot.20184] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
To improve the detection of related proteins, it is often useful to include evolutionary information for both the query and target proteins. One method to include this information is by the use of profile-profile alignments, where a profile from the query protein is compared with the profiles from the target proteins. Profile-profile alignments can be implemented in several fundamentally different ways. The similarity between two positions can be calculated using a dot-product, a probabilistic model, or an information theoretical measure. Here, we present a large-scale comparison of different profile-profile alignment methods. We show that the profile-profile methods perform at least 30% better than standard sequence-profile methods both in their ability to recognize superfamily-related proteins and in the quality of the obtained alignments. Although the performance of all methods is quite similar, profile-profile methods that use a probabilistic scoring function have an advantage as they can create good alignments and show a good fold recognition capacity using the same gap-penalties, while the other methods need to use different parameters to obtain comparable performances.
Collapse
Affiliation(s)
- Tomas Ohlson
- Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden
| | | | | |
Collapse
|
71
|
Reinhardt A, Eisenberg D. DPANN: Improved sequence to structure alignments following fold recognition. Proteins 2004; 56:528-38. [PMID: 15229885 DOI: 10.1002/prot.20144] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
In fold recognition (FR) a protein sequence of unknown structure is assigned to the closest known three-dimensional (3D) fold. Although FR programs can often identify among all possible folds the one a sequence adopts, they frequently fail to align the sequence to the equivalent residue positions in that fold. Such failures frustrate the next step in structure prediction, protein model building. Hence it is desirable to improve the quality of the alignments between the sequence and the identified structure. We have used artificial neural networks (ANN) to derive a substitution matrix to create alignments between a protein sequence and a protein structure through dynamic programming (DPANN: Dynamic Programming meets Artificial Neural Networks). The matrix is based on the amino acid type and the secondary structure state of each residue. In a database of protein pairs that have the same fold but lack sequences-similarity, DPANN aligns over 30% of all sequences to the paired structure, resembling closely the structural superposition of the pair. In over half of these cases the DPANN alignment is close to the structural superposition, although the initial alignment from the step of fold recognition is not close. Conversely, the alignment created during fold recognition outperforms DPANN in only 10% of all cases. Thus application of DPANN after fold recognition leads to substantial improvements in alignment accuracy, which in turn provides more useful templates for the modeling of protein structures. In the artificial case of using actual instead of predicted secondary structures for the probe protein, over 50% of the alignments are successful.
Collapse
|
72
|
Constans P. On the functional significance of electron density protein structure alignments. Proteins 2004; 55:646-55. [PMID: 15103628 DOI: 10.1002/prot.20059] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Electron density protein alignments are analyzed in terms of their underlying similarity measure, the density overlap. These alignments are conceptually unrelated to biochemical structural elements and, therefore, are appropriate in structure-only similarity studies. The analysis is focused on the low sequence similarity subset of protein domains. A remarkable association is found between simple, density overlap measures and the expert designed Structural Classification of Proteins (SCOP) for which functional and evolutive analogies prevail. The association found validates the functional significance of electron density alignments.
Collapse
Affiliation(s)
- Pere Constans
- Department of Chemistry, Rice University, Houston, Texas, USA.
| |
Collapse
|
73
|
Arndt MAE, Krauss J, Schwarzenbacher R, Vu BK, Greene S, Rybak SM. Generation of a highly stable, internalizing anti-CD22 single-chain Fv fragment for targeting non-Hodgkin's lymphoma. Int J Cancer 2004; 107:822-9. [PMID: 14566834 DOI: 10.1002/ijc.11451] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The generation of a single chain Fv (scFv) fragment derived from the anti-CD22 monoclonal antibody LL2 resulted in a molecule with good antigen binding but very poor stability properties, thus hampering its clinical applicability. Here we report on the construction of an engineered LL2 scFv fragment by rational mutagenesis. The contribution of uncommon wild-type sequence residues for providing stability to the conserved common core structure of immunoglobulins was examined. Aided by computer homology modeling, 3 destabilizing residues within the core of the wild-type VH domain were identified. Owing to the conserved nature of the buried core structure, mutagenesis of these sites to respective consensus residues markedly stabilized the molecule but did not influence its antigen binding properties: the engineered scFv MJ-7 exhibited exceptional biophysical stability with a half-life not reached after 6 days of incubation in human serum at 37 degrees C, while fully retaining the epitope specificity of the monoclonal antibody, and antigen binding affinity of the wild-type scFv. Furthermore, both the monoclonal antibody LL2 and the engineered scFv fragment became fully internalized after only 30 min of incubation at 37 degrees C with CD22+ tumor cells. These properties predict scFv MJ-7 could become a novel powerful tool to selectively deliver cytotoxic agents to malignant CD22+ cells.
Collapse
|
74
|
Wallner B, Fang H, Elofsson A. Automatic consensus-based fold recognition using Pcons, ProQ, and Pmodeller. Proteins 2004; 53 Suppl 6:534-41. [PMID: 14579343 DOI: 10.1002/prot.10536] [Citation(s) in RCA: 92] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
CASP provides a unique opportunity to compare the performance of automatic fold recognition methods with the performance of manual experts who might use these methods. Here, we show that a novel automatic fold recognition server, Pmodeller, is getting close to the performance of manual experts. Although a small group of experts still perform better, most of the experts participating in CASP5 actually performed worse even though they had full access to all automatic predictions. Pmodeller is based on Pcons (Lundström et al., Protein Sci 2001; 10(11):2354-2365) the first "consensus" predictor that uses predictions from many other servers. Therefore, the success of Pmodeller and other consensus servers should be seen as a tribute to the collective of all developers of fold recognition servers. Furthermore we show that the inclusion of another novel method, ProQ2, to evaluate the quality of the protein models improves the predictions.
Collapse
Affiliation(s)
- Björn Wallner
- Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden
| | | | | |
Collapse
|
75
|
Capriotti E, Fariselli P, Rossi I, Casadio R. A Shannon entropy-based filter detects high- quality profile-profile alignments in searches for remote homologues. Proteins 2003; 54:351-60. [PMID: 14696197 DOI: 10.1002/prot.10564] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Detection of homologous proteins with low-sequence identity to a given target (remote homologues) is routinely performed with alignment algorithms that take advantage of sequence profile. In this article, we investigate the efficacy of different alignment procedures for the task at hand on a set of 185 protein pairs with similar structures but low-sequence similarity. Criteria based on the SCOP label detection and MaxSub scores are adopted to score the results. We investigate the efficacy of alignments based on sequence-sequence, sequence-profile, and profile-profile information. We confirm that with profile-profile alignments the results are better than with other procedures. In addition, we report, and this is novel, that the selection of the results of the profile-profile alignments can be improved by using Shannon entropy, indicating that this parameter is important to recognize good profile-profile alignments among a plethora of meaningless pairs. By this, we enhance the global search accuracy without losing sensitivity and filter out most of the erroneous alignments. We also show that when the entropy filtering is adopted, the quality of the resulting alignments is comparable to that computed for the target and template structures with CE, a structural alignment program.
Collapse
|
76
|
Marti‐Renom MA, Madhusudhan M, Eswar N, Pieper U, Shen M, Sali A, Fiser A, Mirkovic N, John B, Stuart A. Modeling Protein Structure from its Sequence. ACTA ACUST UNITED AC 2003. [DOI: 10.1002/0471250953.bi0501s03] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Marc A. Marti‐Renom
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - M.S. Madhusudhan
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Narayanan Eswar
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Ursula Pieper
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Min‐yi Shen
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Andras Fiser
- Department of Biochemistry and Seaver Foundation Center for Bioinformatics Albert Einstein College of Medicine Bronx New York
| | - Nebojsa Mirkovic
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| | - Bino John
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| | - Ashley Stuart
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| |
Collapse
|
77
|
Liu T, Rojas A, Ye Y, Godzik A. Homology modeling provides insights into the binding mode of the PAAD/DAPIN/pyrin domain, a fourth member of the CARD/DD/DED domain family. Protein Sci 2003; 12:1872-81. [PMID: 12930987 PMCID: PMC2323985 DOI: 10.1110/ps.0359603] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
The PAAD/DAPIN/pyrin domain is the fourth member of the death domain superfamily, but unlike other members of this family, it is involved not only in apoptosis but also in innate immunity and several other processes. We have identified 40 PAAD domain-containing proteins by extensively searching the genomes of higher eukaryotes and viruses. Phylogenetic analyses suggest that there are five categories of PAAD domains that correlate with the domain architecture of the entire proteins. Homology models built on CARD and DD structures identified functionally important residues by studying conservation patterns on the surface of the models. Surface maps of each subfamily show different distributions of these residues, suggesting that domains from different subfamilies do not interact with each other, forming independent regulatory networks. Helix3 of PAAD is predicted to be critical for dimerization. Multiple alignment analysis and modeling suggest that it may be partly disordered, following a new paradigm for interaction proteins that are stabilized by protein-protein interactions.
Collapse
Affiliation(s)
- Tong Liu
- The Burnham Institute, La Jolla, California 92037, USA
| | | | | | | |
Collapse
|
78
|
Tress ML, Jones D, Valencia A. Predicting reliable regions in protein alignments from sequence profiles. J Mol Biol 2003; 330:705-18. [PMID: 12850141 DOI: 10.1016/s0022-2836(03)00622-3] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
For applications such as comparative modelling one major issue is the reliability of sequence alignments. Reliable regions in alignments can be predicted using sub-optimal alignments of the same pair of sequences. Here we show that reliable regions in alignments can also be predicted from multiple sequence profile information alone. Alignments were created for a set of remotely related pairs of proteins using five different test methods. Structural alignments were used to assess the quality of the alignments and the aligned positions were scored using information from the observed frequencies of amino acid residues in sequence profiles pre-generated for each template structure. High-scoring regions of these profile-derived alignment scores were a good predictor of reliably aligned regions. These profile-derived alignment scores are easy to obtain and are applicable to any alignment method. They can be used to detect those regions of alignments that are reliably aligned and to help predict the quality of an alignment. For those residues within secondary structure elements, the regions predicted as reliably aligned agreed with the structural alignments for between 92% and 97.4% of the residues. In loop regions just under 92% of the residues predicted to be reliable agreed with the structural alignments. The percentage of residues predicted as reliable ranged from 32.1% for helix residues to 52.8% for strand residues. This information could also be used to help predict conserved binding sites from sequence alignments. Residues in the template that were identified as binding sites, that aligned to an identical amino acid residue and where the sequence alignment agreed with the structural alignment were in highly conserved, high scoring regions over 80% of the time. This suggests that many binding sites that are present in both target and template sequences are in sequence-conserved regions and that there is the possibility of translating reliability to binding site prediction.
Collapse
Affiliation(s)
- Michael L Tress
- Protein Design Group, Centro Nacional de Biotechnologia, CNB-CSIC, Cantoblanco, 28049 Madrid, Spain.
| | | | | |
Collapse
|
79
|
Zhao Y, Hong DH, Pawlyk B, Yue G, Adamian M, Grynberg M, Godzik A, Li T. The retinitis pigmentosa GTPase regulator (RPGR)- interacting protein: subserving RPGR function and participating in disk morphogenesis. Proc Natl Acad Sci U S A 2003; 100:3965-70. [PMID: 12651948 PMCID: PMC153031 DOI: 10.1073/pnas.0637349100] [Citation(s) in RCA: 170] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Retinitis pigmentosa is a photoreceptor degenerative disease leading to blindness in adulthood. Leber congenital amaurosis (LCA) describes a more severe condition with visual deficit in early childhood. Defects in the retinitis pigmentosa GTPase regulator (RPGR) and an RPGR-interacting protein (RPGRIP) are known causes of retinitis pigmentosa and LCA, respectively. Both proteins localize in the photoreceptor connecting cilium (CC), a thin bridge linking the cell body and the light-sensing outer segment. We show that RPGR is absent in the CC of photoreceptors lacking RPGRIP, but not vice versa. Mice lacking RPGRIP elaborate grossly oversized outer segment disks resembling a cytochalasin D-induced defect and have a more severe disease than mice lacking RPGR. Mice lacking both proteins are phenotypically indistinguishable from mice lacking RPGRIP alone. In vitro, RPGRIP forms homodimer and elongated filaments via interactions involving its coiled-coil and C-terminal domains. We conclude that RPGRIP is a stable polymer in the CC where it tethers RPGR and that RPGR depends on RPGRIP for subcellular localization and normal function. Our data suggest that RPGRIP is also required for disk morphogenesis, putatively by regulating actin cytoskeleton dynamics. The latter hypothesis may be consistent with a distant homology between the C-terminal domain of RPGRIP and an actin-fragmin kinase, predicted by fold recognition algorithms. A defect in RPGRIP encompasses loss of both functions, hence the more severe clinical manifestation as LCA.
Collapse
Affiliation(s)
- Yun Zhao
- The Berman-Gund Laboratory for the Study of Retinal Degenerations, Harvard Medical School, Massachusetts Eye and Ear Infirmary, Boston, MA 02114, USA
| | | | | | | | | | | | | | | |
Collapse
|
80
|
Affiliation(s)
- András Fiser
- Department of Biochemistry and Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, Bronz, New York 10461, USA
| | | |
Collapse
|
81
|
Nair R, Rost B. Sequence conserved for subcellular localization. Protein Sci 2002; 11:2836-47. [PMID: 12441382 PMCID: PMC2373743 DOI: 10.1110/ps.0207402] [Citation(s) in RCA: 131] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2002] [Revised: 09/05/2002] [Accepted: 09/10/2002] [Indexed: 10/27/2022]
Abstract
The more proteins diverged in sequence, the more difficult it becomes for bioinformatics to infer similarities of protein function and structure from sequence. The precise thresholds used in automated genome annotations depend on the particular aspect of protein function transferred by homology. Here, we presented the first large-scale analysis of the relation between sequence similarity and identity in subcellular localization. Three results stood out: (1) The subcellular compartment is generally more conserved than what might have been expected given that short sequence motifs like nuclear localization signals can alter the native compartment; (2) the sequence conservation of localization is similar between different compartments; and (3) it is similar to the conservation of structure and enzymatic activity. In particular, we found the transition between the regions of conserved and nonconserved localization to be very sharp, although the thresholds for conservation were less well defined than for structure and enzymatic activity. We found that a simple measure for sequence similarity accounting for pairwise sequence identity and alignment length, the HSSP distance, distinguished accurately between protein pairs of identical and different localizations. In fact, BLAST expectation values outperformed the HSSP distance only for alignments in the subtwilight zone. We succeeded in slightly improving the accuracy of inferring localization through homology by fine tuning the thresholds. Finally, we applied our results to the entire SWISS-PROT database and five entirely sequenced eukaryotes.
Collapse
Affiliation(s)
- Rajesh Nair
- Columbia University Bioinformatics Center (CUBIC), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
82
|
Constans P. Linear scaling approaches to quantum macromolecular similarity: evaluating the similarity function. J Comput Chem 2002; 23:1305-13. [PMID: 12214313 DOI: 10.1002/jcc.10140] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The evaluation of the electron density based similarity function scales quadratically with respect to the size of the molecules for simplified, atomic shell densities. Due to the exponential decay of the function's atom-atom terms most interatomic contributions are numerically negligible on large systems. An improved algorithm for the evaluation of the Quantum Molecular Similarity function is presented. This procedure identifies all non-negligible terms without computing unnecessary interatomic squared distances, thus effectively turning to linear scaling the similarity evaluation. Presented also is a minimalist dynamic electron density model. Approximate, single shell densities together with the proposed algorithm facilitate fast electron density based alignments on macromolecules.
Collapse
Affiliation(s)
- Pere Constans
- Department of Chemistry, Rice University, Houston, Texas 77005-1892, USA.
| |
Collapse
|
83
|
Lesley SA, Kuhn P, Godzik A, Deacon AM, Mathews I, Kreusch A, Spraggon G, Klock HE, McMullan D, Shin T, Vincent J, Robb A, Brinen LS, Miller MD, McPhillips TM, Miller MA, Scheibe D, Canaves JM, Guda C, Jaroszewski L, Selby TL, Elsliger MA, Wooley J, Taylor SS, Hodgson KO, Wilson IA, Schultz PG, Stevens RC. Structural genomics of the Thermotoga maritima proteome implemented in a high-throughput structure determination pipeline. Proc Natl Acad Sci U S A 2002; 99:11664-9. [PMID: 12193646 PMCID: PMC129326 DOI: 10.1073/pnas.142413399] [Citation(s) in RCA: 357] [Impact Index Per Article: 16.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 07/11/2002] [Indexed: 11/18/2022] Open
Abstract
Structural genomics is emerging as a principal approach to define protein structure-function relationships. To apply this approach on a genomic scale, novel methods and technologies must be developed to determine large numbers of structures. We describe the design and implementation of a high-throughput structural genomics pipeline and its application to the proteome of the thermophilic bacterium Thermotoga maritima. By using this pipeline, we successfully cloned and attempted expression of 1,376 of the predicted 1,877 genes (73%) and have identified crystallization conditions for 432 proteins, comprising 23% of the T. maritima proteome. Representative structures from TM0423 glycerol dehydrogenase and TM0449 thymidylate synthase-complementing protein are presented as examples of final outputs from the pipeline.
Collapse
Affiliation(s)
- Scott A Lesley
- Joint Center for Structural Genomics, Genomics Institute of the Novartis Research Foundation, 10675 John Jay Hopkins Drive, San Diego, CA 92121, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
84
|
Abstract
A major bottleneck in comparative modeling is the alignment quality; this is especially true for proteins whose distant relationships could be reliably recognized only by recent advances in fold recognition. The best algorithms excel in recognizing distant homologs but often produce incorrect alignments for over 50% of protein pairs in large fold-prediction benchmarks. The alignments obtained by sequence-sequence or sequence-structure matching algorithms differ significantly from the structural alignments. To study this problem, we developed a simplified method to explicitly enumerate all possible alignments for a pair of proteins. This allowed us to estimate the number of significantly different alignments for a given scoring method that score better than the structural alignment. Using several examples of distantly related proteins, we show that for standard sequence-sequence alignment methods, the number of significantly different alignments is usually large, often about 10(10) alternatives. This distance decreases when the alignment method is improved, but the number is still too large for the brute force enumeration approach. More effective strategies were needed, so we evaluated and compared two well-known approaches for searching the space of suboptimal alignments. We combined their best features and produced a hybrid method, which yielded alignments that surpassed the original alignments for about 50% of protein pairs with minimal computational effort.
Collapse
Affiliation(s)
- Lukasz Jaroszewski
- Program in Bioinformatics and Biological Complexity, The Burnham Institute, 10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | | | | |
Collapse
|
85
|
Abstract
One of the most central methods in bioinformatics is the alignment of two protein or DNA sequences. However, so far large-scale benchmarks examining the quality of these alignments are scarce. On the other hand, recently several large-scale studies of the capacity of different methods to identify related sequences has led to new insights about the performance of fold recognition methods. To increase our understanding about fold recognition methods, we present a large-scale benchmark of alignment quality. We compare alignments from several different alignment methods, including sequence alignments, hidden Markov models, PSI-BLAST, CLUSTALW, and threading methods. For most methods, the alignment quality increases significantly at about 20% sequence identity. The difference in alignment quality between different methods is quite small, and the main difference can be seen at the exact positioning of the sharp rise in alignment quality, that is, around 15-20% sequence identity. The alignments are improved by using structural information. In general, the best alignments are obtained by methods that use predicted secondary structure information and sequence profiles obtained from PSI-BLAST. One interesting observation is that for different pairs many different methods create the best alignments. This finding implies that if a method that could select the best alignment method for each pair existed, a significant improvement of the alignment quality could be gained.
Collapse
Affiliation(s)
- Arne Elofsson
- Stockholm Bioinformatics Center, Stockholm University, SE-10691, Stockholm, Sweden.
| |
Collapse
|
86
|
Abstract
In the post-genomic era, the new discipline of functional genomics is now facing the challenge of associating a function (as well as estimating its relevance to industrial applications) to about 100,000 microbial, plant or animal genes of known sequence but unknown function. Besides the design of databases, computational methods are increasingly becoming intimately linked with the various experimental approaches. Consequently, bioinformatics is rapidly evolving into independent fields addressing the specific problems of interpreting i) genomic sequences, ii) protein sequences and 3D-structures, as well as iii) transcriptome and macromolecular interaction data. It is thus increasingly difficult for the biologist to choose the computational approaches that perform best in these various areas. This paper attempts to review the most useful developments of the last 2 years.
Collapse
Affiliation(s)
- J M Claverie
- Structural and Genetic Information Laboratory,UMR 1889 CNRS-AVENTIS, 31 Chemin Joseph Aiguier, 13402 Marseille Cedex 20, France.
| | | | | | | |
Collapse
|
87
|
Abstract
Following the complete genome sequencing of an increasing number of organisms, structural biology is engaging in a systematic approach of high-throughput structure determination called structural genomics to create a complete inventory of protein folds/structures that will help predict functions for all proteins. First results show that structural genomics will be highly effective in finding functional annotations for proteins of unknown function.
Collapse
Affiliation(s)
- P R Mittl
- Institute of Biochemistry, University of Zürich, Winterthurerstrasse 190, 8057 Zürich, Switzerland
| | | |
Collapse
|
88
|
Abstract
Fold assignments for newly sequenced genomes belong to the most important and interesting applications of the booming field of protein structure prediction. We present a brief survey and a discussion of such assignments completed to date, using as an example several fold assignment projects for proteins from the Escherichia coli genome. This review focuses on steps that are necessary to go beyond the simple assignment projects and into the development of tools extending our understanding of functions of proteins in newly sequenced genomes. This paper also discusses several problems seldom addressed in the literature, such as the problem of domain prediction and complementary predictions (e.g., transmembrane regions and flexible regions) and cross-correlation of predictions from different servers. The influence of sequence and structure database growth on prediction success is also addressed. Finally, we discuss the perspectives of the field in the context of massive sequence and structure determination projects, as well as the development of novel prediction methods.
Collapse
Affiliation(s)
- K Pawlowski
- AstraZeneca R&D Lund, Lund, S-221 87, Sweden
| | | | | | | |
Collapse
|