1
|
Homology Modeling Using GPCRM Web Service. Methods Mol Biol 2021; 2268:305-321. [PMID: 34085277 DOI: 10.1007/978-1-0716-1221-7_20] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/18/2023]
Abstract
Homology modeling methods are commonly used for quick and precise construction of a desired protein or its mutant using protein templates, which were determined by crystallography, cryo-EM, or NMR. Due to the increasing number of such structures, the obtained models are precise even in the case of small similarity between sequences of template and modeled proteins. The reason for that is a high evolutionary conservation in the structure regions responsible for keeping the function of proteins. This is also the case for G protein-coupled receptors (GPCRs), which constitute the largest family of membrane receptors with nearly 800 proteins. The GPCRM web service ( https://gpcrm.biomodellab.eu/ ) was set up for the nearly automatic generation of high-quality structures of modeled GPCRs. The three possible paths: "High similarity," "Quick path," and "Long path" allow the user to choose between a fast but less reliable path, up to more reliable but longer procedures. In the Advanced mode the service allows for user modifications including selection of template(s) and a manual adjustment of the sequence alignment.
Collapse
|
2
|
Runthala A. Probabilistic divergence of a template-based modelling methodology from the ideal protocol. J Mol Model 2021; 27:25. [PMID: 33411019 DOI: 10.1007/s00894-020-04640-w] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2020] [Accepted: 12/09/2020] [Indexed: 12/27/2022]
Abstract
Protein structural information is essential for the detailed mapping of a functional protein network. For a higher modelling accuracy and quicker implementation, template-based algorithms have been extensively deployed and redefined. The methods only assess the predicted structure against its native state/template and do not estimate the accuracy for each modelling step. A divergence measure is therefore postulated to estimate the modelling accuracy against its theoretical optimal benchmark. By freezing the domain boundaries, the divergence measures are predicted for the most crucial steps of a modelling algorithm. To precisely refine the score using weighting constants, big data analysis could further be deployed.
Collapse
Affiliation(s)
- Ashish Runthala
- Department of Biotechnology, Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur, Andhra Pradesh, 522502, India.
| |
Collapse
|
3
|
Mulnaes D, Porta N, Clemens R, Apanasenko I, Reiners J, Gremer L, Neudecker P, Smits SHJ, Gohlke H. TopModel: Template-Based Protein Structure Prediction at Low Sequence Identity Using Top-Down Consensus and Deep Neural Networks. J Chem Theory Comput 2020; 16:1953-1967. [DOI: 10.1021/acs.jctc.9b00825] [Citation(s) in RCA: 25] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Daniel Mulnaes
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Nicola Porta
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Rebecca Clemens
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Irina Apanasenko
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Jens Reiners
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Center for Structural Studies Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Lothar Gremer
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Philipp Neudecker
- Institut für Physikalische Biologie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| | - Sander H. J. Smits
- Institute für Biochemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Center for Structural Studies Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
| | - Holger Gohlke
- Institut für Pharmazeutische und Medizinische Chemie, Heinrich-Heine-Universität Düsseldorf, 40225 Düsseldorf, Germany
- Institute of Biological Information Processing (IBI-7: Structural Biochemistry) & JuStruct, Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
- John von Neumann Institute for Computing (NIC) & Jülich Supercomputing Centre (JSC), Forschungszentrum Jülich GmbH, 52425 Jülich, Germany
| |
Collapse
|
4
|
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
5
|
Fidler DR, Murphy SE, Courtis K, Antonoudiou P, El-Tohamy R, Ient J, Levine TP. Using HHsearch to tackle proteins of unknown function: A pilot study with PH domains. Traffic 2016; 17:1214-1226. [PMID: 27601190 PMCID: PMC5091641 DOI: 10.1111/tra.12432] [Citation(s) in RCA: 41] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Revised: 08/30/2016] [Accepted: 08/30/2016] [Indexed: 01/08/2023]
Abstract
Advances in membrane cell biology are hampered by the relatively high proportion of proteins with no known function. Such proteins are largely or entirely devoid of structurally significant domain annotations. Structural bioinformaticians have developed profile‐profile tools such as HHsearch (online version called HHpred), which can detect remote homologies that are missed by tools used to annotate databases. Here we have applied HHsearch to study a single structural fold in a single model organism as proof of principle. In the entire clan of protein domains sharing the pleckstrin homology domain fold in yeast, systematic application of HHsearch accurately identified known PH‐like domains. It also predicted 16 new domains in 13 yeast proteins many of which are implicated in intracellular traffic. One of these was Vps13p, where we confirmed the functional importance of the predicted PH‐like domain. Even though such predictions require considerable work to be corroborated, they are useful first steps. HHsearch should be applied more widely, particularly across entire proteomes of model organisms, to significantly improve database annotations.
Collapse
Affiliation(s)
- David R Fidler
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Sarah E Murphy
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Katherine Courtis
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | | | - Rana El-Tohamy
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Jonathan Ient
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK
| | - Timothy P Levine
- Department of Cell Biology, UCL Institute of Ophthalmology, London, UK.
| |
Collapse
|
6
|
Webb B, Sali A. Comparative Protein Structure Modeling Using MODELLER. CURRENT PROTOCOLS IN BIOINFORMATICS 2016; 54:5.6.1-5.6.37. [PMID: 27322406 PMCID: PMC5031415 DOI: 10.1002/cpbi.3] [Citation(s) in RCA: 1845] [Impact Index Per Article: 230.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
Comparative protein structure modeling predicts the three-dimensional structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and how to use the ModBase database of such models, and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described. © 2016 by John Wiley & Sons, Inc.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | - Andrej Sali
- University of California at San Francisco, San Francisco, California
| |
Collapse
|
7
|
Ghouzam Y, Postic G, Guerin PE, de Brevern AG, Gelly JC. ORION: a web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci Rep 2016; 6:28268. [PMID: 27319297 PMCID: PMC4913311 DOI: 10.1038/srep28268] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2016] [Accepted: 06/01/2016] [Indexed: 11/09/2022] Open
Abstract
Protein structure prediction based on comparative modeling is the most efficient way to produce structural models when it can be performed. ORION is a dedicated webserver based on a new strategy that performs this task. The identification by ORION of suitable templates is performed using an original profile-profile approach that combines sequence and structure evolution information. Structure evolution information is encoded into profiles using structural features, such as solvent accessibility and local conformation -with Protein Blocks-, which give an accurate description of the local protein structure. ORION has recently been improved, increasing by 5% the quality of its results. The ORION web server accepts a single protein sequence as input and searches homologous protein structures within minutes. Various databases such as PDB, SCOP and HOMSTRAD can be mined to find an appropriate structural template. For the modeling step, a protein 3D structure can be directly obtained from the selected template by MODELLER and displayed with global and local quality model estimation measures. The sequence and the predicted structure of 4 examples from the CAMEO server and a recent CASP11 target from the 'Hard' category (T0818-D1) are shown as pertinent examples. Our web server is accessible at http://www.dsimb.inserm.fr/ORION/.
Collapse
Affiliation(s)
- Yassine Ghouzam
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Guillaume Postic
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Pierre-Edouard Guerin
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Alexandre G. de Brevern
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| | - Jean-Christophe Gelly
- INSERM, U 1134, DSIMB, F-75739 Paris, France
- Univ. Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, F-75739 Paris, France
- Institut National de la Transfusion Sanguine (INTS), F-75739 Paris, France
- Laboratoire d’Excellence GR-Ex, F-75739 Paris, France
| |
Collapse
|
8
|
Ghouzam Y, Postic G, de Brevern AG, Gelly JC. Improving protein fold recognition with hybrid profiles combining sequence and structure evolution. Bioinformatics 2015; 31:3782-9. [PMID: 26254434 DOI: 10.1093/bioinformatics/btv462] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2015] [Accepted: 08/02/2015] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Template-based modeling, the most successful approach for predicting protein 3D structure, often requires detecting distant evolutionary relationships between the target sequence and proteins of known structure. Developed for this purpose, fold recognition methods use elaborate strategies to exploit evolutionary information, mainly by encoding amino acid sequence into profiles. Since protein structure is more conserved than sequence, the inclusion of structural information can improve the detection of remote homology. RESULTS Here, we present ORION, a new fold recognition method based on the pairwise comparison of hybrid profiles that contain evolutionary information from both protein sequence and structure. Our method uses the 16-state structural alphabet Protein Blocks, which provides an accurate 1D description of protein structure local conformations. ORION systematically outperforms PSI-BLAST and HHsearch on several benchmarks, including target sequences from the modeling competitions CASP8, 9 and 10, and detects ∼10% more templates at fold and superfamily SCOP levels. AVAILABILITY Software freely available for download at http://www.dsimb.inserm.fr/orion/. CONTACT jean-christophe.gelly@univ-paris-diderot.fr. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yassine Ghouzam
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Guillaume Postic
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Alexandre G de Brevern
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| | - Jean-Christophe Gelly
- Inserm U1134, Paris, France, Université Paris Diderot, Sorbonne Paris Cité, UMR_S 1134, Paris, France, Institut National de la Transfusion Sanguine, Paris, France and Laboratory of Excellence GR-Ex, Paris, France
| |
Collapse
|
9
|
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Benjamin Webb
- University of California at San Francisco, San Francisco, California
| | | |
Collapse
|
10
|
Joseph AP, de Brevern AG. From local structure to a global framework: recognition of protein folds. J R Soc Interface 2014; 11:20131147. [PMID: 24740960 DOI: 10.1098/rsif.2013.1147] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Protein folding has been a major area of research for many years. Nonetheless, the mechanisms leading to the formation of an active biological fold are still not fully apprehended. The huge amount of available sequence and structural information provides hints to identify the putative fold for a given sequence. Indeed, protein structures prefer a limited number of local backbone conformations, some being characterized by preferences for certain amino acids. These preferences largely depend on the local structural environment. The prediction of local backbone conformations has become an important factor to correctly identifying the global protein fold. Here, we review the developments in the field of local structure prediction and especially their implication in protein fold recognition.
Collapse
Affiliation(s)
- Agnel Praveen Joseph
- Science and Technology Facilities Council, Rutherford Appleton Laboratory, Harwell Oxford, , Didcot OX11 0QX, UK
| | | |
Collapse
|
11
|
Webb B, Eswar N, Fan H, Khuri N, Pieper U, Dong G, Sali A. Comparative Modeling of Drug Target Proteins☆. REFERENCE MODULE IN CHEMISTRY, MOLECULAR SCIENCES AND CHEMICAL ENGINEERING 2014. [PMCID: PMC7157477 DOI: 10.1016/b978-0-12-409547-2.11133-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state-of-the-art by a number of specific examples.
Collapse
|
12
|
Xu D, Jaroszewski L, Li Z, Godzik A. FFAS-3D: improving fold recognition by including optimized structural features and template re-ranking. ACTA ACUST UNITED AC 2013; 30:660-7. [PMID: 24130308 DOI: 10.1093/bioinformatics/btt578] [Citation(s) in RCA: 80] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Homology detection enables grouping proteins into families and prediction of their structure and function. The range of application of homology-based predictions can be significantly extended by using sequence profiles and incorporation of local structural features. However, incorporation of the latter terms varies a lot between existing methods, and together with many examples of distant relations not recognized even by the best methods, suggests that further improvements are still possible. RESULTS Here we describe recent improvements to the fold and function assignment system (FFAS) method, including adding optimized structural features (experimental or predicted), 'symmetrical' Z-score calculation and re-ranking the templates with a neural network. The alignment accuracy in the new FFAS-3D is now 11% higher than the original and comparable with the most accurate template-based structure prediction algorithms. At the same time, FFAS-3D has high success rate at the Structural Classification of Proteins (SCOP) family, superfamily and fold levels. Importantly, FFAS-3D results are not highly correlated with other programs suggesting that it may significantly improve meta-predictions. FFAS-3D does not require 3D structures of the templates, as using predicted features instead of structure-derived does not lead to the decrease of accuracy. Because of that, FFAS-3D can be used for databases other than Protein Data Bank (PDB) such as Protein families database or Clusters of orthologous groups thus extending its applications to functional annotations of genomes and protein families. AVAILABILITY AND IMPLEMENTATION FFAS-3D is available at http://ffas.godziklab.org.
Collapse
Affiliation(s)
- Dong Xu
- Bioinformatics and Systems Biology Program, Sanford-Burnham Medical Research Institute, 10901 North Torrey Pines Road, La Jolla, CA 92037, Center for Research in Biological Systems, University of California, San Diego, 9500 Gilman Dr. La Jolla, CA 92093-0446, USA and Center of Excellence in Genomic Medicine Research (CEGMR), King Fahad Medical Research Center, King Abdulaziz University, P.O. Box 80216, Jeddah 21589, Kingdom of Saudi Arabia
| | | | | | | |
Collapse
|
13
|
Latek D, Pasznik P, Carlomagno T, Filipek S. Towards improved quality of GPCR models by usage of multiple templates and profile-profile comparison. PLoS One 2013; 8:e56742. [PMID: 23468878 PMCID: PMC3585245 DOI: 10.1371/journal.pone.0056742] [Citation(s) in RCA: 47] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2012] [Accepted: 01/14/2013] [Indexed: 11/19/2022] Open
Abstract
UNLABELLED G-protein coupled receptors (GPCRs) are targets of nearly one third of the drugs at the current pharmaceutical market. Despite their importance in many cellular processes the crystal structures are available for less than 20 unique GPCRs of the Rhodopsin-like class. Fortunately, even though involved in different signaling cascades, this large group of membrane proteins has preserved a uniform structure comprising seven transmembrane helices that allows quite reliable comparative modeling. Nevertheless, low sequence similarity between the GPCR family members is still a serious obstacle not only in template selection but also in providing theoretical models of acceptable quality. An additional level of difficulty is the prediction of kinks and bulges in transmembrane helices. Usage of multiple templates and generation of alignments based on sequence profiles may increase the rate of success in difficult cases of comparative modeling in which the sequence similarity between GPCRs is exceptionally low. Here, we present GPCRM, a novel method for fast and accurate generation of GPCR models using averaging of multiple template structures and profile-profile comparison. In particular, GPCRM is the first GPCR structure predictor incorporating two distinct loop modeling techniques: Modeller and Rosetta together with the filtering of models based on the Z-coordinate. We tested our approach on all unique GPCR structures determined to date and report its performance in comparison with other computational methods targeting the Rhodopsin-like class. We also provide a database of precomputed GPCR models of the human receptors from that class. AVAILABILITY GPCRM SERVER AND DATABASE: http://gpcrm.biomodellab.eu.
Collapse
Affiliation(s)
- Dorota Latek
- International Institute of Molecular and Cell Biology, Warsaw, Poland
- * E-mail: (DL); (SF)
| | - Pawel Pasznik
- International Institute of Molecular and Cell Biology, Warsaw, Poland
| | - Teresa Carlomagno
- EMBL, Structural and Computational Biology Unit, Heidelberg, Germany
| | - Slawomir Filipek
- Faculty of Chemistry, University of Warsaw, Warsaw, Poland
- * E-mail: (DL); (SF)
| |
Collapse
|
14
|
Xu D, Zhang Y. Toward optimal fragment generations for ab initio protein structure assembly. Proteins 2012; 81:229-39. [PMID: 22972754 DOI: 10.1002/prot.24179] [Citation(s) in RCA: 170] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2012] [Revised: 08/06/2012] [Accepted: 09/03/2012] [Indexed: 01/03/2023]
Abstract
Fragment assembly using structural motifs excised from other solved proteins has shown to be an efficient method for ab initio protein-structure prediction. However, how to construct accurate fragments, how to derive optimal restraints from fragments, and what the best fragment length is are the basic issues yet to be systematically examined. In this work, we developed a gapless-threading method to generate position-specific structure fragments. Distance profiles and torsion angle pairs are then derived from the fragments by statistical consistency analysis, which achieved comparable accuracy with the machine-learning-based methods although the fragments were taken from unrelated proteins. When measured by both accuracies of the derived distance profiles and torsion angle pairs, we come to a consistent conclusion that the optimal fragment length for structural assembly is around 10, and at least 100 fragments at each location are needed to achieve optimal structure assembly. The distant profiles and torsion angle pairs as derived by the fragments have been successfully used in QUARK for ab initio protein structure assembly and are provided by the QUARK online server at http://zhanglab.ccmb. med.umich.edu/QUARK/.
Collapse
Affiliation(s)
- Dong Xu
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | | |
Collapse
|
15
|
Gniewek P, Kolinski A, Gront D. Optimization of profile-to-profile alignment parameters for one-dimensional threading. J Comput Biol 2012; 19:879-86. [PMID: 22731622 DOI: 10.1089/cmb.2011.0307] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
The development of automatic approaches for the comparison of protein sequences has become increasingly important. Methods that compare profiles allow for the use of information about whole protein families, resulting in more sensitive and accurate detection of distantly related sequences. In this contribution, we describe a thorough optimization and tests of a profile-to-profile alignment method. A number of different scoring schemes has been implemented and compared on the basis of their ability to identify a template protein from the same SCOP family as a query. In addition to sequence profiles, secondary structure profiles were used to increase the rate of successful detection. Our results show that a properly tuned one-dimensional threading method can recognize a correct template from the same SCOP family nearly as well as structural alignment. Our benchmark set, which might be useful in other similar studies, as well as the fold-recognition software we developed may be downloaded (www.bioshell.pl/profile-alignments).
Collapse
Affiliation(s)
- Pawel Gniewek
- Faculty of Chemistry, Warsaw University, Warsaw, Poland
| | | | | |
Collapse
|
16
|
Tomii K, Sawada Y, Honda S. Convergent evolution in structural elements of proteins investigated using cross profile analysis. BMC Bioinformatics 2012; 13:11. [PMID: 22244085 PMCID: PMC3398312 DOI: 10.1186/1471-2105-13-11] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2011] [Accepted: 01/16/2012] [Indexed: 11/10/2022] Open
Abstract
Background Evolutionary relations of similar segments shared by different protein folds remain controversial, even though many examples of such segments have been found. To date, several methods such as those based on the results of structure comparisons, sequence-based classifications, and sequence-based profile-profile comparisons have been applied to identify such protein segments that possess local similarities in both sequence and structure across protein folds. However, to capture more precise sequence-structure relations, no method reported to date combines structure-based profiles, and sequence-based profiles based on evolutionary information. The former are generally regarded as representing the amino acid preferences at each position of a specific conformation of protein segment. They might reflect the nature of ancient short peptide ancestors, using the results of structural classifications of protein segments. Results This report describes the development and use of "Cross Profile Analysis" to compare sequence-based profiles and structure-based profiles based on amino acid occurrences at each position within a protein segment cluster. Using systematic cross profile analysis, we found structural clusters of 9-residue and 15-residue segments showing remarkably strong correlation with particular sequence profiles. These correlations reflect structural similarities among constituent segments of both sequence-based and structure-based profiles. We also report previously undetectable sequence-structure patterns that transcend protein family and fold boundaries, and present results of the conformational analysis of the deduced peptide of a segment cluster. These results suggest the existence of ancient short-peptide ancestors. Conclusions Cross profile analysis reveals the polyphyletic and convergent evolution of β-hairpin-like structures, which were verified both experimentally and computationally. The results presented here give us new insights into the evolution of short protein segments.
Collapse
|
17
|
Ye X, Wang G, Altschul SF. An assessment of substitution scores for protein profile-profile comparison. Bioinformatics 2011; 27:3356-63. [PMID: 21998158 PMCID: PMC3232366 DOI: 10.1093/bioinformatics/btr565] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2011] [Revised: 09/22/2011] [Accepted: 10/06/2011] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied. RESULTS A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales. CONTACT altschul@ncbi.nlm.nih.gov SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Xugang Ye
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | | | | |
Collapse
|
18
|
Lin HN, Notredame C, Chang JM, Sung TY, Hsu WL. Improving the alignment quality of consistency based aligners with an evaluation function using synonymous protein words. PLoS One 2011; 6:e27872. [PMID: 22163274 PMCID: PMC3229492 DOI: 10.1371/journal.pone.0027872] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2011] [Accepted: 10/27/2011] [Indexed: 11/18/2022] Open
Abstract
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently. In this paper, we present a flexible similarity measure for residue pairs to improve the quality of protein sequence alignment. Our approach, called SymAlign, relies on the identification of conserved words found across a sizeable fraction of the considered dataset, and supported by evolutionary analysis. These words are then used to define a position specific substitution matrix that better reflects the biological significance of local similarity. The experiment results show that the SymAlign scoring scheme can be incorporated within T-Coffee to improve sequence alignment accuracy. We also demonstrate that SymAlign is less sensitive to the presence of structurally non-similar proteins. In the analysis of the relationship between sequence identity and structure similarity, SymAlign can better differentiate structurally similar proteins from non- similar proteins. We show that protein sequence alignments can be significantly improved using a similarity estimation based on weighted n-grams. In our analysis of the alignments thus produced, sequence conservation becomes a better indicator of structural similarity. SymAlign also provides alignment visualization that can display sub-optimal alignments on dot-matrices. The visualization makes it easy to identify well-supported alternative alignments that may not have been identified by dynamic programming. SymAlign is available at http://bio-cluster.iis.sinica.edu.tw/SymAlign/.
Collapse
Affiliation(s)
- Hsin-Nan Lin
- Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan
| | | | - Jia-Ming Chang
- Centre for Genomic Regulation (CRG), UPF, Barcelona, Spain
| | - Ting-Yi Sung
- Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (TYS); (WLH)
| | - Wen-Lian Hsu
- Bioinformatics Lab, Institute of Information Science, Academia Sinica, Taipei, Taiwan
- * E-mail: (TYS); (WLH)
| |
Collapse
|
19
|
Cai XH, Jaroszewski L, Wooley J, Godzik A. Internal organization of large protein families: relationship between the sequence, structure, and function-based clustering. Proteins 2011; 79:2389-402. [PMID: 21671455 DOI: 10.1002/prot.23049] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2010] [Revised: 02/12/2011] [Accepted: 03/13/2011] [Indexed: 12/14/2022]
Abstract
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.
Collapse
Affiliation(s)
- Xiao-Hui Cai
- Joint Center for Structural Genomics, Center for Research in Biological Systems, University of California, San Diego, California 92093-0446, USA
| | | | | | | |
Collapse
|
20
|
Systematic assessment of accuracy of comparative model of proteins belonging to different structural fold classes. J Mol Model 2011; 17:2831-7. [PMID: 21301906 DOI: 10.1007/s00894-011-0976-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2010] [Accepted: 01/17/2011] [Indexed: 10/18/2022]
Abstract
In the absence of experimental structures, comparative modeling continues to be the chosen method for retrieving structural information on target proteins. However, models lack the accuracy of experimental structures. Alignment error and structural divergence (between target and template) influence model accuracy the most. Here, we examine the potential additional impact of backbone geometry, as our previous studies have suggested that the structural class (all-α, αβ, all-β) of a protein may influence the accuracy of its model. In the twilight zone (sequence identity ≤ 30%) and at a similar level of target-template divergence, the accuracy of protein models does indeed follow the trend all-α > αβ > all-β. This is mainly because the alignment accuracy follows the same trend (all-α > αβ > all-β), with backbone geometry playing only a minor role. Differences in the diversity of sequences belonging to different structural classes leads to the observed accuracy differences, thus enabling the accuracy of alignments/models to be estimated a priori in a class-dependent manner. This study provides a systematic description of and quantifies the structural class-dependent effect in comparative modeling. The study also suggests that datasets for large-scale sequence/structure analyses should have equal representations of different structural classes to avoid class-dependent bias.
Collapse
|
21
|
Goncearenco A, Berezovsky IN. Prototypes of elementary functional loops unravel evolutionary connections between protein functions. ACTA ACUST UNITED AC 2010; 26:i497-503. [PMID: 20823313 PMCID: PMC2935408 DOI: 10.1093/bioinformatics/btq374] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Motivation: Earlier studies of protein structure revealed closed loops with a characteristic size 25–30 residues and ring-like shape as a basic universal structural element of globular proteins. Elementary functional loops (EFLs) have specific signatures and provide functional residues important for binding/activation and principal chemical transformation steps of the enzymatic reaction. The goal of this work is to show how these functional loops evolved from pre-domain peptides and to find a set of prototypes from which the EFLs of contemporary proteins originated. Results: This article describes a computational method for deriving prototypes of EFLs based on the sequences of complete genomes. The procedure comprises the iterative derivation of sequence profiles followed by their hierarchical clustering. The scoring function takes into account information content on profile positions, thus preserving the signature. The statistical significance of scores is evaluated from the empirical distribution of scores of the background model. A set of prototypes of EFLs from archaeal proteomes is derived. This set delineates evolutionary connections between major functions and illuminates how folds and functions emerged in pre-domain evolution as a combination of prototypes. Contact:Igor.Berezovsky@uni.no
Collapse
Affiliation(s)
- Alexander Goncearenco
- Bergen Center for Computational Science and Department of Informatics, University of Bergen, Bergen, Norway
| | | |
Collapse
|
22
|
Altschul SF, Wootton JC, Zaslavsky E, Yu YK. The construction and use of log-odds substitution scores for multiple sequence alignment. PLoS Comput Biol 2010; 6:e1000852. [PMID: 20657661 PMCID: PMC2904766 DOI: 10.1371/journal.pcbi.1000852] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 06/03/2010] [Indexed: 01/18/2023] Open
Abstract
Most pairwise and multiple sequence alignment programs seek alignments with optimal scores. Central to defining such scores is selecting a set of substitution scores for aligned amino acids or nucleotides. For local pairwise alignment, substitution scores are implicitly of log-odds form. We now extend the log-odds formalism to multiple alignments, using Bayesian methods to construct "BILD" ("Bayesian Integral Log-odds") substitution scores from prior distributions describing columns of related letters. This approach has been used previously only to define scores for aligning individual sequences to sequence profiles, but it has much broader applicability. We describe how to calculate BILD scores efficiently, and illustrate their uses in Gibbs sampling optimization procedures, gapped alignment, and the construction of hidden Markov model profiles. BILD scores enable automated selection of optimal motif and domain model widths, and can inform the decision of whether to include a sequence in a multiple alignment, and the selection of insertion and deletion locations. Other applications include the classification of related sequences into subfamilies, and the definition of profile-profile alignment scores. Although a fully realized multiple alignment program must rely upon more than substitution scores, many existing multiple alignment programs can be modified to employ BILD scores. We illustrate how simple BILD score based strategies can enhance the recognition of DNA binding domains, including the Api-AP2 domain in Toxoplasma gondii and Plasmodium falciparum.
Collapse
Affiliation(s)
- Stephen F Altschul
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, United States of America.
| | | | | | | |
Collapse
|
23
|
Yan RX, Si JN, Wang C, Zhang Z. DescFold: a web server for protein fold recognition. BMC Bioinformatics 2009; 10:416. [PMID: 20003426 PMCID: PMC2803855 DOI: 10.1186/1471-2105-10-416] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2009] [Accepted: 12/14/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Machine learning-based methods have been proven to be powerful in developing new fold recognition tools. In our previous work [Zhang, Kochhar and Grigorov (2005) Protein Science, 14: 431-444], a machine learning-based method called DescFold was established by using Support Vector Machines (SVMs) to combine the following four descriptors: a profile-sequence-alignment-based descriptor using Psi-blast e-values and bit scores, a sequence-profile-alignment-based descriptor using Rps-blast e-values and bit scores, a descriptor based on secondary structure element alignment (SSEA), and a descriptor based on the occurrence of PROSITE functional motifs. In this work, we focus on the improvement of DescFold by incorporating more powerful descriptors and setting up a user-friendly web server. RESULTS In seeking more powerful descriptors, the profile-profile alignment score generated from the COMPASS algorithm was first considered as a new descriptor (i.e., PPA). When considering a profile-profile alignment between two proteins in the context of fold recognition, one protein is regarded as a template (i.e., its 3D structure is known). Instead of a sequence profile derived from a Psi-blast search, a structure-seeded profile for the template protein was generated by searching its structural neighbors with the assistance of the TM-align structural alignment algorithm. Moreover, the COMPASS algorithm was used again to derive a profile-structural-profile-alignment-based descriptor (i.e., PSPA). We trained and tested the new DescFold in a total of 1,835 highly diverse proteins extracted from the SCOP 1.73 version. When the PPA and PSPA descriptors were introduced, the new DescFold boosts the performance of fold recognition substantially. Using the SCOP_1.73_40% dataset as the fold library, the DescFold web server based on the trained SVM models was further constructed. To provide a large-scale test for the new DescFold, a stringent test set of 1,866 proteins were selected from the SCOP 1.75 version. At a less than 5% false positive rate control, the new DescFold is able to correctly recognize structural homologs at the fold level for nearly 46% test proteins. Additionally, we also benchmarked the DescFold method against several well-established fold recognition algorithms through the LiveBench targets and Lindahl dataset. CONCLUSIONS The new DescFold method was intensively benchmarked to have very competitive performance compared with some well-established fold recognition methods, suggesting that it can serve as a useful tool to assist in template-based protein structure prediction. The DescFold server is freely accessible at http://202.112.170.199/DescFold/index.html.
Collapse
Affiliation(s)
- Ren-Xiang Yan
- State Key Laboratory of Agrobiotechnology, College of Biological Sciences, China Agricultural University, Beijing 100193, China.
| | | | | | | |
Collapse
|
24
|
Abstract
MOTIVATION Homologous protein families share highly conserved sequence and structure regions that are frequent targets for comparative analysis of related proteins and families. Many protein families, such as the curated domain families in the Conserved Domain Database (CDD), exhibit similar structural cores. To improve accuracy in aligning such protein families, we propose a profile-profile method CORAL that aligns individual core regions as gap-free units. RESULTS CORAL computes optimal local alignment of two profiles with heuristics to preserve continuity within core regions. We benchmarked its performance on curated domains in CDD, which have pre-defined core regions, against COMPASS, HHalign and PSI-BLAST, using structure superpositions and comprehensive curator-optimized alignments as standards of truth. CORAL improves alignment accuracy on core regions over general profile methods, returning a balanced score of 0.57 for over 80% of all domain families in CDD, compared with the highest balanced score of 0.45 from other methods. Further, CORAL provides E-values to aid in detecting homologous protein families and, by respecting block boundaries, produces alignments with improved 'readability' that facilitate manual refinement. AVAILABILITY CORAL will be included in future versions of the NCBI Cn3D/CDTree software, which can be downloaded at http://www.ncbi.nlm.nih.gov/Structure/cdtree/cdtree.shtml. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jessica H Fong
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | |
Collapse
|
25
|
Lobley A, Sadowski MI, Jones DT. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. ACTA ACUST UNITED AC 2009; 25:1761-7. [PMID: 19429599 DOI: 10.1093/bioinformatics/btp302] [Citation(s) in RCA: 213] [Impact Index Per Article: 14.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Generation of structural models and recognition of homologous relationships for unannotated protein sequences are fundamental problems in bioinformatics. Improving the sensitivity and selectivity of methods designed for these two tasks therefore has downstream benefits for many other bioinformatics applications. RESULTS We describe the latest implementation of the GenTHREADER method for structure prediction on a genomic scale. The method combines profile-profile alignments with secondary-structure specific gap-penalties, classic pair- and solvation potentials using a linear combination optimized with a regression SVM model. We find this combination significantly improves both detection of useful templates and accuracy of sequence-structure alignments relative to other competitive approaches. We further present a second implementation of the protocol designed for the task of discriminating superfamilies from one another. This method, pDomTHREADER, is the first to incorporate both sequence and structural data directly in this task and improves sensitivity and selectivity over the standard version of pGenTHREADER and three other standard methods for remote homology detection.
Collapse
Affiliation(s)
- Anna Lobley
- Department of Computer Science, University College London, UK
| | | | | |
Collapse
|
26
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 750] [Impact Index Per Article: 46.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
27
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1758] [Impact Index Per Article: 109.9] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
28
|
Bennett-Lovsey RM, Herbert AD, Sternberg MJE, Kelley LA. Exploring the extremes of sequence/structure space with ensemble fold recognition in the program Phyre. Proteins 2008; 70:611-25. [PMID: 17876813 DOI: 10.1002/prot.21688] [Citation(s) in RCA: 348] [Impact Index Per Article: 21.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Structural and functional annotation of the large and growing database of genomic sequences is a major problem in modern biology. Protein structure prediction by detecting remote homology to known structures is a well-established and successful annotation technique. However, the broad spectrum of evolutionary change that accompanies the divergence of close homologues to become remote homologues cannot easily be captured with a single algorithm. Recent advances to tackle this problem have involved the use of multiple predictive algorithms available on the Internet. Here we demonstrate how such ensembles of predictors can be designed in-house under controlled conditions and permit significant improvements in recognition by using a concept taken from protein loop energetics and applying it to the general problem of 3D clustering. We have developed a stringent test that simulates the situation where a protein sequence of interest is submitted to multiple different algorithms and not one of these algorithms can make a confident (95%) correct assignment. A method of meta-server prediction (Phyre) that exploits the benefits of a controlled environment for the component methods was implemented. At 95% precision or higher, Phyre identified 64.0% of all correct homologous query-template relationships, and 84.0% of the individual test query proteins could be accurately annotated. In comparison to the improvement that the single best fold recognition algorithm (according to training) has over PSI-Blast, this represents a 29.6% increase in the number of correct homologous query-template relationships, and a 46.2% increase in the number of accurately annotated queries. It has been well recognised in fold prediction, other bioinformatics applications, and in many other areas, that ensemble predictions generally are superior in accuracy to any of the component individual methods. However there is a paucity of information as to why the ensemble methods are superior and indeed this has never been systematically addressed in fold recognition. Here we show that the source of ensemble power stems from noise reduction in filtering out false positive matches. The results indicate greater coverage of sequence space and improved model quality, which can consequently lead to a reduction in the experimental workload of structural genomics initiatives.
Collapse
Affiliation(s)
- Riccardo M Bennett-Lovsey
- Structural Bioinformatics Group, Division of Molecular Biosciences, Imperial College London, London SW7 2AY, United Kingdom
| | | | | | | |
Collapse
|
29
|
Tcheremenskaia O, Giuliani A, Tomasi M. PROFALIGN algorithm identifies the regions containing folding determinants by scoring pairs of hydrophobic profiles of remotely related proteins. J Comput Biol 2008; 15:445-55. [PMID: 18386966 DOI: 10.1089/cmb.2007.0100] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Profile comparison methods have been shown to be very powerful in creating accurate alignments of protein sequences, especially in the case of remotely related proteins (RRP). These methods take advantage of the observation that hydrophobic profiles are more conserved than the corresponding amino acid sequences. Here, we present the PROFALIGN algorithm, which allows one to perform a detailed comparative analysis, at both local and global levels of two protein sequence profiles. The user can either choose among four different hydrophobic scales (Miyazawa-Jernigan, Eisenberg, Engelman-Steiz, and Kyte-Doolittle) or can add a personal scale. The interface is designed for a wide range of users, including those who are not involved in protein research. It allows one to vary the alignment parameters (such as gap penalties, embedding, and profile smoothness). Secondary structure propensity is added as an optional alignment filter. Similar segments of two proteins are singled out on the basis of score. We have tested the algorithm with different Src homology 3 (SH3) domain fragments sharing low sequence homology but very similar three-dimensional (3D) structures. By using the Miyazawa-Jernigan hydrophobic scale, PROFALIGN was able to detect the strong correlation between the regions that are known to be crucial for SH3 transition state topology. PROFALIGN seems able to identify most of the mutual alignment of structures on the basis of their hydrophobic profiles, delimiting the regions containing the key determinants of folding. Therefore, the present methodology may be useful for the detection of the most structurally relevant positions inside remote related proteins.
Collapse
Affiliation(s)
- Olga Tcheremenskaia
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome, Italy.
| | | | | |
Collapse
|
30
|
Sterner B, Singh R, Berger B. Predicting and annotating catalytic residues: an information theoretic approach. J Comput Biol 2007; 14:1058-73. [PMID: 17887954 DOI: 10.1089/cmb.2007.0042] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
We introduce a computational method to predict and annotate the catalytic residues of a protein using only its sequence information, so that we describe both the residues' sequence locations (prediction) and their specific biochemical roles in the catalyzed reaction (annotation). While knowing the chemistry of an enzyme's catalytic residues is essential to understanding its function, the challenges of prediction and annotation have remained difficult, especially when only the enzyme's sequence and no homologous structures are available. Our sequence-based approach follows the guiding principle that catalytic residues performing the same biochemical function should have similar chemical environments; it detects specific conservation patterns near in sequence to known catalytic residues and accordingly constrains what combination of amino acids can be present near a predicted catalytic residue. We associate with each catalytic residue a short sequence profile and define a Kullback-Leibler (KL) distance measure between these profiles, which, as we show, effectively captures even subtle biochemical variations. We apply the method to the class of glycohydrolase enzymes. This class includes proteins from 96 families with very different sequences and folds, many of which perform important functions. In a cross-validation test, our approach correctly predicts the location of the enzymes' catalytic residues with a sensitivity of 80% at a specificity of 99.4%, and in a separate cross-validation we also correctly annotate the biochemical role of 80% of the catalytic residues. Our results compare favorably to existing methods. Moreover, our method is more broadly applicable because it relies on sequence and not structure information; it may, furthermore, be used in conjunction with structure-based methods.
Collapse
Affiliation(s)
- Beckett Sterner
- Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, Massachusetts, USA
| | | | | |
Collapse
|
31
|
Fu X, Apgar JR, Keating AE. Modeling backbone flexibility to achieve sequence diversity: the design of novel alpha-helical ligands for Bcl-xL. J Mol Biol 2007; 371:1099-117. [PMID: 17597151 PMCID: PMC1994813 DOI: 10.1016/j.jmb.2007.04.069] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2007] [Revised: 04/26/2007] [Accepted: 04/27/2007] [Indexed: 11/27/2022]
Abstract
Computational protein design can be used to select sequences that are compatible with a fixed-backbone template. This strategy has been used in numerous instances to engineer novel proteins. However, the fixed-backbone assumption severely restricts the sequence space that is accessible via design. For challenging problems, such as the design of functional proteins, this may not be acceptable. Here, we present a method for introducing backbone flexibility into protein design calculations and apply it to the design of diverse helical BH3 ligands that bind to the anti-apoptotic protein Bcl-xL, a member of the Bcl-2 protein family. We demonstrate how normal mode analysis can be used to sample different BH3 backbones, and show that this leads to a larger and more diverse set of low-energy solutions than can be achieved using a native high-resolution Bcl-xL complex crystal structure as a template. We tested several of the designed solutions experimentally and found that this approach worked well when normal mode calculations were used to deform a native BH3 helix structure, but less well when they were used to deform an idealized helix. A subsequent round of design and testing identified a likely source of the problem as inadequate sampling of the helix pitch. In all, we tested 17 designed BH3 peptide sequences, including several point mutants. Of these, eight bound well to Bcl-xL and four others showed weak but detectable binding. The successful designs showed a diversity of sequences that would have been difficult or impossible to achieve using only a fixed backbone. Thus, introducing backbone flexibility via normal mode analysis effectively broadened the set of sequences identified by computational design, and provided insight into positions important for binding Bcl-xL.
Collapse
Affiliation(s)
- Xiaoran Fu
- MIT Department of Biology, 77 Massachusetts Ave, Cambridge, MA 02139, USA
| | | | | |
Collapse
|
32
|
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.
Collapse
|
33
|
Söding J, Biegert A, Lupas AN. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res 2005; 33:W244-8. [PMID: 15980461 PMCID: PMC1160169 DOI: 10.1093/nar/gki408] [Citation(s) in RCA: 2803] [Impact Index Per Article: 147.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
HHpred is a fast server for remote protein homology detection and structure prediction and is the first to implement pairwise comparison of profile hidden Markov models (HMMs). It allows to search a wide choice of databases, such as the PDB, SCOP, Pfam, SMART, COGs and CDD. It accepts a single query sequence or a multiple alignment as input. Within only a few minutes it returns the search results in a user-friendly format similar to that of PSI-BLAST. Search options include local or global alignment and scoring secondary structure similarity. HHpred can produce pairwise query-template alignments, multiple alignments of the query with a set of templates selected from the search results, as well as 3D structural models that are calculated by the MODELLER software from these alignments. A detailed help facility is available. As a demonstration, we analyze the sequence of SpoVT, a transcriptional regulator from Bacillus subtilis. HHpred can be accessed at http://protevo.eb.tuebingen.mpg.de/hhpred.
Collapse
Affiliation(s)
- Johannes Söding
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology Spemannstrasse 35, 72076 Tübingen, Germany.
| | | | | |
Collapse
|
34
|
Frenkel-Morgenstern M, Singer A, Bronfeld H, Pietrokovski S. One-Block CYRCA: an automated procedure for identifying multiple-block alignments from single block queries. Nucleic Acids Res 2005; 33:W281-3. [PMID: 15980470 PMCID: PMC1160248 DOI: 10.1093/nar/gki488] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
One-Block CYRCA is an automated procedure for identifying multiple-block alignments from single block queries (). It is based on the LAMA and CYRCA block-to-block alignment methods. The procedure identifies whether the query blocks can form new multiple-block alignments (block sets) with blocks from a database or join pre-existing database block sets. Using pre-computed LAMA block alignments and CYRCA sets from the Blocks database reduces the computation time. LAMA and CYRCA are highly sensitive and selective methods that can augment many other sequence analysis approaches.
Collapse
Affiliation(s)
| | - Alice Singer
- Bioinformatics undergraduate program, Bar-Ilan UniversityRamat-Gan 52900, Israel
| | - Hagit Bronfeld
- Bioinformatics undergraduate program, Bar-Ilan UniversityRamat-Gan 52900, Israel
| | - Shmuel Pietrokovski
- To whom correspondence should be addressed. Tel: +972 8 934 2747; Fax: +972 8 934 4108;
| |
Collapse
|
35
|
Frenkel-Morgenstern M, Voet H, Pietrokovski S. Enhanced statistics for local alignment of multiple alignments improves prediction of protein function and structure. Bioinformatics 2005; 21:2950-6. [PMID: 15870168 DOI: 10.1093/bioinformatics/bti462] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Improved comparisons of multiple sequence alignments (profiles) with other profiles can identify subtle relationships between protein families and motifs significantly beyond the resolution of sequence-based comparisons. RESULTS The local alignment of multiple alignments (LAMA) method was modified to estimate alignment score significance by applying a new measure based on Fisher's combining method. To verify the new procedure, we used known protein structures, sequence annotations and cyclical relations consistency analysis (CYRCA) sets of consistently aligned blocks. Using the new significance measure improved the sensitivity of LAMA without altering its selectivity. The program performed better than other profile-to-profile methods (COMPASS and Prof_sim) and a sequence-to-profile method (PSI-BLAST). The testing was large scale and used several parameters, including pseudo-counts profile calculations and local ungapped blocks or more extended gapped profiles. This comparison provides guidelines to the relative advantages of each method for different cases. We demonstrate and discuss the unique advantages of using block multiple alignments of protein motifs.
Collapse
|
36
|
Ginalski K, Grishin NV, Godzik A, Rychlewski L. Practical lessons from protein structure prediction. Nucleic Acids Res 2005; 33:1874-91. [PMID: 15805122 PMCID: PMC1074308 DOI: 10.1093/nar/gki327] [Citation(s) in RCA: 99] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Despite recent efforts to develop automated protein structure determination protocols, structural genomics projects are slow in generating fold assignments for complete proteomes, and spatial structures remain unknown for many protein families. Alternative cheap and fast methods to assign folds using prediction algorithms continue to provide valuable structural information for many proteins. The development of high-quality prediction methods has been boosted in the last years by objective community-wide assessment experiments. This paper gives an overview of the currently available practical approaches to protein structure prediction capable of generating accurate fold assignment. Recent advances in assessment of the prediction quality are also discussed.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- BioInfoBank Instituteul. Limanowskiego 24A, 60-744 Poznań, Poland
- Interdisciplinary Centre for Mathematical and Computational Modelling, Warsaw UniversityPawińskiego 5a, 02-106 Warsaw, Poland
- Department of Biochemistry, University of Texas, Southwestern Medical Center5323 Harry Hines Boulevard, Dallas, TX 75390-9038, USA
| | - Nick V. Grishin
- Department of Biochemistry, University of Texas, Southwestern Medical Center5323 Harry Hines Boulevard, Dallas, TX 75390-9038, USA
- Howard Hughes Medical Institute, University of Texas, Southwestern Medical Center5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA
| | - Adam Godzik
- The Burnham Institute10901 N. Torrey Pines Road, La Jolla, CA 92037, USA
| | - Leszek Rychlewski
- BioInfoBank Instituteul. Limanowskiego 24A, 60-744 Poznań, Poland
- To whom correspondence should be addressed. Tel: +48 604 628805; Fax: +48 61 8643350;
| |
Collapse
|
37
|
Abstract
Sequence alignment profiles have been shown to be very powerful in creating accurate sequence alignments. Profiles are often used to search a sequence database with a local alignment algorithm. More accurate and longer alignments have been obtained with profile-to-profile comparison. There are several steps that must be performed in creating profile-profile alignments, and each involves choices in parameters and algorithms. These steps include (1) what sequences to include in a multiple alignment used to build each profile, (2) how to weight similar sequences in the multiple alignment and how to determine amino acid frequencies from the weighted alignment, (3) how to score a column from one profile aligned to a column of the other profile, (4) how to score gaps in the profile-profile alignment, and (5) how to include structural information. Large-scale benchmarks consisting of pairs of homologous proteins with structurally determined sequence alignments are necessary for evaluating the efficacy of each scoring scheme. With such a benchmark, we have investigated the properties of profile-profile alignments and found that (1) with optimized gap penalties, most column-column scoring functions behave similarly to one another in alignment accuracy; (2) some functions, however, have much higher search sensitivity and specificity; (3) position-specific weighting schemes in determining amino acid counts in columns of multiple sequence alignments are better than sequence-specific schemes; (4) removing positions in the profile with gaps in the query sequence results in better alignments; and (5) adding predicted and known secondary structure information improves alignments.
Collapse
Affiliation(s)
- Guoli Wang
- Institute for Cancer Research, Fox Chase Cancer Center, 333 Cottman Avenue, Philadelphia, PA 19111, USA
| | | |
Collapse
|
38
|
Analysis of superfamily specific profile-profile recognition accuracy. BMC Bioinformatics 2004; 5:200. [PMID: 15603591 PMCID: PMC543460 DOI: 10.1186/1471-2105-5-200] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2004] [Accepted: 12/16/2004] [Indexed: 11/10/2022] Open
Abstract
Background Annotation of sequences that share little similarity to sequences of known function remains a major obstacle in genome annotation. Some of the best methods of detecting remote relationships between protein sequences are based on matching sequence profiles. We analyse the superfamily specific performance of sequence profile-profile matching. Our benchmark consists of a set of 16 protein superfamilies that are highly diverse at the sequence level. We relate the performance to the number of sequences in the profiles, the profile diversity and the extent of structural conservation in the superfamily. Results The performance varies greatly between superfamilies with the truncated receiver operating characteristic, ROC10, varying from 0.95 down to 0.01. These large differences persist even when the profiles are trimmed to approximately the same level of diversity. Conclusions Although the number of sequences in the profile (profile width) and degree of sequence variation within positions in the profile (profile diversity) contribute to accurate detection there are other superfamily specific factors.
Collapse
|
39
|
Abstract
MOTIVATION Protein homology detection and sequence alignment are at the basis of protein structure prediction, function prediction and evolution. RESULTS We have generalized the alignment of protein sequences with a profile hidden Markov model (HMM) to the case of pairwise alignment of profile HMMs. We present a method for detecting distant homologous relationships between proteins based on this approach. The method (HHsearch) is benchmarked together with BLAST, PSI-BLAST, HMMER and the profile-profile comparison tools PROF_SIM and COMPASS, in an all-against-all comparison of a database of 3691 protein domains from SCOP 1.63 with pairwise sequence identities below 20%.Sensitivity: When the predicted secondary structure is included in the HMMs, HHsearch is able to detect between 2.7 and 4.2 times more homologs than PSI-BLAST or HMMER and between 1.44 and 1.9 times more than COMPASS or PROF_SIM for a rate of false positives of 10%. Approximately half of the improvement over the profile-profile comparison methods is attributable to the use of profile HMMs in place of simple profiles. Alignment quality: Higher sensitivity is mirrored by an increased alignment quality. HHsearch produced 1.2, 1.7 and 3.3 times more good alignments ('balanced' score >0.3) than the next best method (COMPASS), and 1.6, 2.9 and 9.4 times more than PSI-BLAST, at the family, superfamily and fold level, respectively.Speed: HHsearch scans a query of 200 residues against 3691 domains in 33 s on an AMD64 2GHz PC. This is 10 times faster than PROF_SIM and 17 times faster than COMPASS.
Collapse
Affiliation(s)
- Johannes Söding
- Department of Protein Evolution, Max-Planck-Institute for Developmental Biology Spemannstrasse 35, D-72076 Tübingen, Germany.
| |
Collapse
|
40
|
Gront D, Kolinski A. A new approach to prediction of short-range conformational propensities in proteins. Bioinformatics 2004; 21:981-7. [PMID: 15509604 DOI: 10.1093/bioinformatics/bti080] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Knowledge-based potentials are valuable tools for protein structure modeling and evaluation of the quality of the structure prediction obtained by a variety of methods. Potentials of such type could be significantly enhanced by a proper exploitation of the evolutionary information encoded in related protein sequences. The new potentials could be valuable components of threading algorithms, ab-initio protein structure prediction, comparative modeling and structure modeling based on fragmentary experimental data. RESULTS A new potential for scoring local protein geometry is designed and evaluated. The approach is based on the similarity of short protein fragments measured by an alignment of their sequence profiles. Sequence specificity of the resulting energy function has been compared with the specificity of simpler potentials using gapless threading and the ability to predict specific geometry of protein fragments. Significant improvement in threading sensitivity and in the ability to generate sequence-specific protein-like conformations has been achieved.
Collapse
Affiliation(s)
- Dominik Gront
- Faculty of Chemistry, Warsaw University Pasteura 1, 02-093 Warsaw, Poland.
| | | |
Collapse
|
41
|
Abstract
The accuracy of an alignment between two protein sequences can be improved by including other detectably related sequences in the comparison. We optimize and benchmark such an approach that relies on aligning two multiple sequence alignments, each one including one of the two protein sequences. Thirteen different protocols for creating and comparing profiles corresponding to the multiple sequence alignments are implemented in the SALIGN command of MODELLER. A test set of 200 pairwise, structure-based alignments with sequence identities below 40% is used to benchmark the 13 protocols as well as a number of previously described sequence alignment methods, including heuristic pairwise sequence alignment by BLAST, pairwise sequence alignment by global dynamic programming with an affine gap penalty function by the ALIGN command of MODELLER, sequence-profile alignment by PSI-BLAST, Hidden Markov Model methods implemented in SAM and LOBSTER, pairwise sequence alignment relying on predicted local structure by SEA, and multiple sequence alignment by CLUSTALW and COMPASS. The alignment accuracies of the best new protocols were significantly better than those of the other tested methods. For example, the fraction of the correctly aligned residues relative to the structure-based alignment by the best protocol is 56%, which can be compared with the accuracies of 26%, 42%, 43%, 48%, 50%, 49%, 43%, and 43% for the other methods, respectively. The new method is currently applied to large-scale comparative protein structure modeling of all known sequences.
Collapse
Affiliation(s)
- Marc A Marti-Renom
- Mission Bay Genentech Hall, University of California, San Francisco, San Francisco, CA 94143, USA.
| | | | | |
Collapse
|
42
|
Ginalski K, von Grotthuss M, Grishin NV, Rychlewski L. Detecting distant homology with Meta-BASIC. Nucleic Acids Res 2004; 32:W576-81. [PMID: 15215454 PMCID: PMC441508 DOI: 10.1093/nar/gkh370] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Meta-BASIC (http://basic.bioinfo.pl) is a novel sensitive approach for recognition of distant similarity between proteins based on consensus alignments of meta profiles. Specifically, Meta-BASIC compares sequence profiles combined with predicted secondary structure by utilizing several scoring systems and alignment algorithms. In our benchmarking tests, Meta-BASIC outperforms many individual servers, including fold recognition servers, and it can compete with meta predictors that base their strength on the structural comparison of models. In addition, Meta-BASIC, which enables detection of very distant relationships even if the tertiary structure for the reference protein is not known, has a high-throughput capability. This new method is applied to 860 PfamA protein families with unknown function (DUF) and provides many novel structure-functional assignments available on-line at http://basic.bioinfo.pl/duf.pl. Detailed discussion is provided for two of the most interesting assignments. DUF271 and DUF431 are predicted to be a nucleotide-diphospho-sugar transferase and an alpha/beta-knot SAM-dependent RNA methyltransferase, respectively.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- Department of Biochemistry, University of Texas, Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, Texas 75390-9038, USA.
| | | | | | | |
Collapse
|
43
|
Goonesekere NCW, Lee B. Frequency of gaps observed in a structurally aligned protein pair database suggests a simple gap penalty function. Nucleic Acids Res 2004; 32:2838-43. [PMID: 15155852 PMCID: PMC419611 DOI: 10.1093/nar/gkh610] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Gap penalty is an important component of the scoring scheme that is needed when searching for homologous proteins and for accurate alignment of protein sequences. Most homology search and sequence alignment algorithms employ a heuristic 'affine gap penalty' scheme q + r x n, in which q is the penalty for opening a gap, r the penalty for extending it and n the gap length. In order to devise a more rational scoring scheme, we examined the pattern of gaps that occur in a database of structurally aligned protein domain pairs. We find that the logarithm of the frequency of gaps varies linearly with the length of the gap, but with a break at a gap of length 3, and is well approximated by two linear regression lines with R2 values of 1.0 and 0.99. The bilinear behavior is retained when gaps are categorized by secondary structures of the two residues flanking the gap. Similar results were obtained when another, totally independent, structurally aligned protein pair database was used. These results suggest a modification of the affine gap penalty function.
Collapse
Affiliation(s)
- Nalin C W Goonesekere
- Laboratory of Molecular Biology, Center for Cancer Research, National Cancer Institute, National Institutes of Health, Building 37, Room 5120, 37 Convent Drive MSC 4264, Bethesda, MD 20892-4264, USA
| | | |
Collapse
|
44
|
Capriotti E, Fariselli P, Rossi I, Casadio R. A Shannon entropy-based filter detects high- quality profile-profile alignments in searches for remote homologues. Proteins 2003; 54:351-60. [PMID: 14696197 DOI: 10.1002/prot.10564] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
Abstract
Detection of homologous proteins with low-sequence identity to a given target (remote homologues) is routinely performed with alignment algorithms that take advantage of sequence profile. In this article, we investigate the efficacy of different alignment procedures for the task at hand on a set of 185 protein pairs with similar structures but low-sequence similarity. Criteria based on the SCOP label detection and MaxSub scores are adopted to score the results. We investigate the efficacy of alignments based on sequence-sequence, sequence-profile, and profile-profile information. We confirm that with profile-profile alignments the results are better than with other procedures. In addition, we report, and this is novel, that the selection of the results of the profile-profile alignments can be improved by using Shannon entropy, indicating that this parameter is important to recognize good profile-profile alignments among a plethora of meaningless pairs. By this, we enhance the global search accuracy without losing sensitivity and filter out most of the erroneous alignments. We also show that when the entropy filtering is adopted, the quality of the resulting alignments is comparable to that computed for the target and template structures with CE, a structural alignment program.
Collapse
|
45
|
Ginalski K, Pas J, Wyrwicz LS, von Grotthuss M, Bujnicki JM, Rychlewski L. ORFeus: Detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res 2003; 31:3804-7. [PMID: 12824423 PMCID: PMC168911 DOI: 10.1093/nar/gkg504] [Citation(s) in RCA: 107] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
ORFeus is a fully automated, sensitive protein sequence similarity search server available to the academic community via the Structure Prediction Meta Server (http://BioInfo.PL/Meta/). The goal of the development of ORFeus was to increase the sensitivity of the detection of distantly related protein families. Predicted secondary structure information was added to the information about sequence conservation and variability, a technique known from hybrid threading approaches. The accuracy of the meta profiles created this way is compared with profiles containing only sequence information and with the standard approach of aligning a single sequence with a profile. Additionally, the alignment of meta profiles is more sensitive in detecting remote homology between protein families than if aligning two sequence-only profiles or if aligning a profile with a sequence. The specificity of the alignment score is improved in the lower specificity range compared with the robust sequence-only profiles.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- Bioinformatics Laboratory, BioInfoBank Institute, ul. Limanowskiego 24A, 60-744 Poznan, Poland
| | | | | | | | | | | |
Collapse
|