1
|
Ismer J, Rose AS, Tiemann JKS, Goede A, Preissner R, Hildebrand PW. SL2: an interactive webtool for modeling of missing segments in proteins. Nucleic Acids Res 2016; 44:W390-4. [PMID: 27105847 PMCID: PMC4987885 DOI: 10.1093/nar/gkw297] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Accepted: 04/11/2016] [Indexed: 11/22/2022] Open
Abstract
SuperLooper2 (SL2) (http://proteinformatics.charite.de/sl2) is the updated version of our previous web-server SuperLooper, a fragment based tool for the prediction and interactive placement of loop structures into globular and helical membrane proteins. In comparison to our previous version, SL2 benefits from both a considerably enlarged database of fragments derived from high-resolution 3D protein structures of globular and helical membrane proteins, and the integration of a new protein viewer. The database, now with double the content, significantly improved the coverage of fragment conformations and prediction quality. The employment of the NGL viewer for visualization of the protein under investigation and interactive selection of appropriate loops makes SL2 independent of third-party plug-ins and additional installations.
Collapse
Affiliation(s)
- Jochen Ismer
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| | - Alexander S Rose
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| | - Johanna K S Tiemann
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| | - Andrean Goede
- Institute of Physiology & Experimental Clinical Research Center, University Medicine, Berlin, 13125, Germany
| | - Robert Preissner
- Institute of Physiology & Experimental Clinical Research Center, University Medicine, Berlin, 13125, Germany
| | - Peter W Hildebrand
- Institute of Medical Physics and Biophysics, University Medicine, Berlin, 10117 Berlin, Germany
| |
Collapse
|
2
|
Messih MA, Lepore R, Tramontano A. LoopIng: a template-based tool for predicting the structure of protein loops. Bioinformatics 2015; 31:3767-72. [PMID: 26249814 PMCID: PMC4653384 DOI: 10.1093/bioinformatics/btv438] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2015] [Accepted: 07/21/2015] [Indexed: 12/31/2022] Open
Abstract
Motivation: Predicting the structure of protein loops is very challenging, mainly because they are not necessarily subject to strong evolutionary pressure. This implies that, unlike the rest of the protein, standard homology modeling techniques are not very effective in modeling their structure. However, loops are often involved in protein function, hence inferring their structure is important for predicting protein structure as well as function. Results: We describe a method, LoopIng, based on the Random Forest automated learning technique, which, given a target loop, selects a structural template for it from a database of loop candidates. Compared to the most recently available methods, LoopIng is able to achieve similar accuracy for short loops (4–10 residues) and significant enhancements for long loops (11–20 residues). The quality of the predictions is robust to errors that unavoidably affect the stem regions when these are modeled. The method returns a confidence score for the predicted template loops and has the advantage of being very fast (on average: 1 min/loop). Availability and implementation:www.biocomputing.it/looping Contact:anna.tramontano@uniroma1.it Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Rosalba Lepore
- Department of Physics, Sapienza University, 00185 Rome, Italy and
| | - Anna Tramontano
- Department of Physics, Sapienza University, 00185 Rome, Italy and Istituto Pasteur-Fondazione Cenci Bolognetti, Viale Regina Elena 291, 00161 Rome, Italy
| |
Collapse
|
3
|
Rocha L. Toward a better understanding of structural divergences in proteins using different secondary structure assignment methods. J Mol Struct 2014. [DOI: 10.1016/j.molstruc.2014.01.060] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
4
|
Abstract
Structural proteomics aims to understand the structural basis of protein interactions and functions. A prerequisite for this is the availability of 3D protein structures that mediate the biochemical interactions. The explosion in the number of available gene sequences set the stage for the next step in genome-scale projects -- to obtain 3D structures for each protein. To achieve this ambitious goal, the slow and costly structure determination experiments are supplemented with theoretical approaches. The current state and recent advances in structure modeling approaches are reviewed here, with special emphasis on comparative protein structure modeling techniques.
Collapse
Affiliation(s)
- András Fiser
- Department of Biochemistry, Seaver Foundation Center for Bioinformatics, Albert Einstein College of Medicine, 1300 Morris Park Ave., Bronx, NY 10461, USA.
| |
Collapse
|
5
|
Kelm S, Vangone A, Choi Y, Ebejer JP, Shi J, Deane CM. Fragment-based modeling of membrane protein loops: successes, failures, and prospects for the future. Proteins 2013; 82:175-86. [PMID: 23589399 DOI: 10.1002/prot.24299] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2012] [Revised: 02/22/2013] [Accepted: 03/26/2013] [Indexed: 11/12/2022]
Abstract
Membrane proteins (MPs) have become a major focus in structure prediction, due to their medical importance. There is, however, a lack of fast and reliable methods that specialize in the modeling of MP loops. Often methods designed for soluble proteins (SPs) are applied directly to MPs. In this article, we investigate the validity of such an approach in the realm of fragment-based methods. We also examined the differences in membrane and soluble protein loops that might affect accuracy. We test our ability to predict soluble and MP loops with the previously published method FREAD. We show that it is possible to predict accurately the structure of MP loops using a database of MP fragments (0.5-1 Å median root-mean-square deviation). The presence of homologous proteins in the database helps prediction accuracy. However, even when homologues are removed better results are still achieved using fragments of MPs (0.8-1.6 Å) rather than SPs (1-4 Å) to model MP loops. We find that many fragments of SPs have shapes similar to their MP counterparts but have very different sequences; however, they do not appear to differ in their substitution patterns. Our findings may allow further improvements to fragment-based loop modeling algorithms for MPs. The current version of our proof-of-concept loop modeling protocol produces high-accuracy loop models for MPs and is available as a web server at http://medeller.info/fread.
Collapse
Affiliation(s)
- Sebastian Kelm
- Department of Statistics, University of Oxford, Oxford, United Kingdom
| | | | | | | | | | | |
Collapse
|
6
|
Abstract
The prediction of loop structures is considered one of the main challenges in the protein folding problem. Regardless of the dependence of the overall algorithm on the protein data bank, the flexibility of loop regions dictates the need for special attention to their structures. In this article, we present algorithms for loop structure prediction with fixed stem and flexible stem geometry. In the flexible stem geometry problem, only the secondary structure of three stem residues on either side of the loop is known. In the fixed stem geometry problem, the structure of the three stem residues on either side of the loop is also known. Initial loop structures are generated using a probability database for the flexible stem geometry problem, and using torsion angle dynamics for the fixed stem geometry problem. Three rotamer optimization algorithms are introduced to alleviate steric clashes between the generated backbone structures and the side chain rotamers. The structures are optimized by energy minimization using an all-atom force field. The optimized structures are clustered using a traveling salesman problem-based clustering algorithm. The structures in the densest clusters are then utilized to refine dihedral angle bounds on all amino acids in the loop. The entire procedure is carried out for a number of iterations, leading to improved structure prediction and refined dihedral angle bounds. The algorithms presented in this article have been tested on 3190 loops from the PDBSelect25 data set and on targets from the recently concluded CASP9 community-wide experiment.
Collapse
Affiliation(s)
- A. Subramani
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
7
|
Abstract
MOTIVATION Template-based modelling can approximate the unknown structure of a target protein using an homologous template structure. The core of the resulting prediction then comprises the structural regions conserved between template and target. Target prediction could be improved by rigidly repositioning such single template, structurally conserved fragment regions. The purpose of this article is to quantify the extent to which such improvements are possible and to relate this extent to properties of the target, the template and their alignment. RESULTS The improvement in accuracy achievable when rigid fragments from a single template are optimally positioned was calculated using structure pairs from the HOMSTRAD database, as well as CASP7 and CASP8 target/best template pairs. Over the union of the structurally conserved regions, improvements of 0.7 A in root mean squared deviation (RMSD) and 6% in GDT_HA were commonly observed. A generalized linear model revealed that the extent to which a template can be improved can be predicted using four variables. Templates with the greatest scope for improvement tend to have relatively more fragments, shorter fragments, higher percentage of helical secondary structure and lower sequence identity. Optimal positioning of the template fragments offers the potential for improving loop modelling. These results demonstrate that substantial improvement could be made on many templates if the conserved fragments were to be optimally positioned. They also provide a basis for identifying templates for which modification of fragment positions may yield such improvements.
Collapse
Affiliation(s)
- Braddon K Lance
- Department of Statistics, Macquarie University, North Ryde, Australia.
| | | | | |
Collapse
|
8
|
Choi Y, Deane CM. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins 2010; 78:1431-40. [PMID: 20034110 DOI: 10.1002/prot.22658] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, United Kingdom.
| | | |
Collapse
|
9
|
Tyagi M, Bornot A, Offmann B, de Brevern AG. Analysis of loop boundaries using different local structure assignment methods. Protein Sci 2009; 18:1869-81. [PMID: 19606500 DOI: 10.1002/pro.198] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Loops connect regular secondary structures. In many instances, they are known to play important biological roles. Analysis and prediction of loop conformations depend directly on the definition of repetitive structures. Nonetheless, the secondary structure assignment methods (SSAMs) often lead to divergent assignments. In this study, we analyzed, both structure and sequence point of views, how the divergence between different SSAMs affect boundary definitions of loops connecting regular secondary structures. The analysis of SSAMs underlines that no clear consensus between the different SSAMs can be easily found. Because these latter greatly influence the loop boundary definitions, important variations are indeed observed, that is, capping positions are shifted between different SSAMs. On the other hand, our results show that the sequence information in these capping regions are more stable than expected, and, classical and equivalent sequence patterns were found for most of the SSAMs. This is, to our knowledge, the most exhaustive survey in this field as (i) various databank have been used leading to similar results without implication of protein redundancy and (ii) the first time various SSAMs have been used. This work hence gives new insights into the difficult question of assignment of repetitive structures and addresses the issue of loop boundaries definition. Although SSAMs give very different local structure assignments capping sequence patterns remain efficiently stable.
Collapse
Affiliation(s)
- Manoj Tyagi
- Laboratoire de Biochimie et Génétique Moléculaire, Université de La Réunion, BP 7151, 15 avenue René Cassin, 97715 Saint Denis Messag Cedex 09, La Réunion, France
| | | | | | | |
Collapse
|
10
|
Liu P, Zhu F, Rassokhin DN, Agrafiotis DK. A self-organizing algorithm for modeling protein loops. PLoS Comput Biol 2009; 5:e1000478. [PMID: 19696883 PMCID: PMC2719875 DOI: 10.1371/journal.pcbi.1000478] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Accepted: 07/20/2009] [Indexed: 11/19/2022] Open
Abstract
Protein loops, the flexible short segments connecting two stable secondary
structural units in proteins, play a critical role in protein structure and
function. Constructing chemically sensible conformations of protein loops that
seamlessly bridge the gap between the anchor points without introducing any
steric collisions remains an open challenge. A variety of algorithms have been
developed to tackle the loop closure problem, ranging from inverse kinematics to
knowledge-based approaches that utilize pre-existing fragments extracted from
known protein structures. However, many of these approaches focus on the
generation of conformations that mainly satisfy the fixed end point condition,
leaving the steric constraints to be resolved in subsequent post-processing
steps. In the present work, we describe a simple solution that simultaneously
satisfies not only the end point and steric conditions, but also chirality and
planarity constraints. Starting from random initial atomic coordinates, each
individual conformation is generated independently by using a simple alternating
scheme of pairwise distance adjustments of randomly chosen atoms, followed by
fast geometric matching of the conformationally rigid components of the
constituent amino acids. The method is conceptually simple, numerically stable
and computationally efficient. Very importantly, additional constraints, such as
those derived from NMR experiments, hydrogen bonds or salt bridges, can be
incorporated into the algorithm in a straightforward and inexpensive way, making
the method ideal for solving more complex multi-loop problems. The remarkable
performance and robustness of the algorithm are demonstrated on a set of protein
loops of length 4, 8, and 12 that have been used in previous studies. Protein loops play an important role in protein function, such as ligand binding,
recognition, and allosteric regulation. However, due to their flexibility, it is
notoriously difficult to determine their 3D structures using traditional
experimental techniques. As a result, one can often find protein structures with
missing loops in the Protein Data Bank. Their sequence variability also presents
a particular challenge for homology modeling methods, which can only yield good
overall structures given sufficient sequence identity and good experimental
reference structures. Despite extensive research, the construction of protein
loop 3D structures remains an open problem, since a sensible conformation should
seamlessly bridge the anchor points without introducing steric clashes within
the loop itself or between the loop and its surroundings environment. Here, we
present a conceptually simple, mathematically straightforward, numerically
robust and computationally efficient approach for building protein loop
conformations that simultaneously satisfy end-point, steric, planar and chiral
constraints. More importantly, additional constraints derived from experimental
sources can be incorporated in a straightforward manner, allowing the processing
of more complex structures involving multiple interlocking loops.
Collapse
Affiliation(s)
- Pu Liu
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
- * E-mail: (PL); (DKA)
| | - Fangqiang Zhu
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
| | - Dmitrii N. Rassokhin
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
| | - Dimitris K. Agrafiotis
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
- * E-mail: (PL); (DKA)
| |
Collapse
|
11
|
Abstract
We describe a fast and accurate protocol, LoopBuilder, for the prediction of loop conformations in proteins. The procedure includes extensive sampling of backbone conformations, side chain addition, the use of a statistical potential to select a subset of these conformations, and, finally, an energy minimization and ranking with an all-atom force field. We find that the Direct Tweak algorithm used in the previously developed LOOPY program is successful in generating an ensemble of conformations that on average are closer to the native conformation than those generated by other methods. An important feature of Direct Tweak is that it checks for interactions between the loop and the rest of the protein during the loop closure process. DFIRE is found to be a particularly effective statistical potential that can bias conformation space toward conformations that are close to the native structure. Its application as a filter prior to a full molecular mechanics energy minimization both improves prediction accuracy and offers a significant savings in computer time. Final scoring is based on the OPLS/SBG-NP force field implemented in the PLOP program. The approach is also shown to be quite successful in predicting loop conformations for cases where the native side chain conformations are assumed to be unknown, suggesting that it will prove effective in real homology modeling applications. Proteins 2008. © 2007 Wiley-Liss, Inc.
Collapse
Affiliation(s)
- Cinque S Soto
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | | | | | | | |
Collapse
|
12
|
Peng HP, Yang AS. Modeling protein loops with knowledge-based prediction of sequence-structure alignment. Bioinformatics 2007; 23:2836-42. [PMID: 17827204 DOI: 10.1093/bioinformatics/btm456] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY http://cmb.genomics.sinica.edu.tw
Collapse
Affiliation(s)
- Hung-Pin Peng
- Genomics Research Center, Academia Sinica. 128 Academia Road, Section 2, Nankang District, Taipei 115, Taiwan, ROC
| | | |
Collapse
|
13
|
Mehler EL, Hassan SA, Kortagere S, Weinstein H. Ab initio computational modeling of loops in G-protein-coupled receptors: lessons from the crystal structure of rhodopsin. Proteins 2006; 64:673-90. [PMID: 16729264 DOI: 10.1002/prot.21022] [Citation(s) in RCA: 35] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
With the help of the crystal structure of rhodopsin an ab initio method has been developed to calculate the three-dimensional structure of the loops that connect the transmembrane helices (TMHs). The goal of this procedure is to calculate the loop structures in other G-protein coupled receptors (GPCRs) for which only model coordinates of the TMHs are available. To mimic this situation a construct of rhodopsin was used that only includes the experimental coordinates of the TMHs while the rest of the structure, including the terminal domains, has been removed. To calculate the structure of the loops a method was designed based on Monte Carlo (MC) simulations which use a temperature annealing protocol, and a scaled collective variables (SCV) technique with proper structural constraints. Because only part of the protein is used in the calculations the usual approach of modeling loops, which consists of finding a single, lowest energy conformation of the system, is abandoned because such a single structure may not be a representative member of the native ensemble. Instead, the method was designed to generate structural ensembles from which the single lowest free energy ensemble is identified as representative of the native folding of the loop. To find the native ensemble a successive series of SCV-MC simulations are carried out to allow the loops to undergo structural changes in a controlled manner. To increase the chances of finding the native funnel for the loop, some of the SCV-MC simulations are carried out at elevated temperatures. The native ensemble can be identified by an MC search starting from any conformation already in the native funnel. The hypothesis is that native structures are trapped in the conformational space because of the high-energy barriers that surround the native funnel. The existence of such ensembles is demonstrated by generating multiple copies of the loops from their crystal structures in rhodopsin and carrying out an extended SCV-MC search. For the extracellular loops e1 and e3, and the intracellular loop i1 that were used in this work, the procedure resulted in dense clusters of structures with Calpha-RMSD approximately 0.5 angstroms. To test the predictive power of the method the crystal structure of each loop was replaced by its extended conformations. For e1 and i1 the procedure identifies native clusters with Calpha-RMSD approximately 0.5 angstroms and good structural overlap of the side chains; for e3, two clusters were found with Calpha-RMSD approximately 1.1 angstroms each, but with poor overlap of the side chains. Further searching led to a single cluster with lower Calpha-RMSD but higher energy than the two previous clusters. This discrepancy was found to be due to the missing elements in the constructs available from experiment for use in the calculations. Because this problem will likely appear whenever parts of the structural information are missing, possible solutions are discussed.
Collapse
Affiliation(s)
- Ernest L Mehler
- Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, New York 10021, USA.
| | | | | | | |
Collapse
|
14
|
Kortagere S, Roy A, Mehler EL. Ab initio computational modeling of long loops in G-protein coupled receptors. J Comput Aided Mol Des 2006; 20:427-36. [PMID: 16972169 DOI: 10.1007/s10822-006-9056-0] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2006] [Accepted: 07/11/2006] [Indexed: 12/27/2022]
Abstract
A newly developed approach for predicting the structure of segments that connect known elements of secondary structure in proteins has been applied to some of the longer loops in the G-protein coupled receptors (GPCRs) rhodopsin and the dopamine receptor D2R. The algorithm uses Monte Carlo (MC) simulation in a temperature annealing protocol combined with a scaled collective variables (SCV) technique to search conformation space for loop structures that could belong to the native ensemble. Except for rhodopsin, structural information is only available for the transmembrane helices (TMHs), and therefore the usual approach of finding a single conformation of lowest energy has to be abandoned. Instead the MC search aims to find the ensemble located at the absolute minimum free energy, i.e., the native ensemble. It is assumed that structures in the native ensemble can be found by an MC search starting from any conformation in the native funnel. The hypothesis is that native structures are trapped in this part of conformational space because of the high-energy barriers that surround the native funnel. In this work it is shown that the crystal structure of the second extracellular loop (e2) of rhodopsin is a member of this loop's native ensemble. In contrast, the crystal structure of the third intracellular loop is quite different in the different crystal structures that have been reported. Our calculations indicate, that of three crystal structures examined, two show features characteristic of native ensembles while the other one does not. Finally the protocol is used to calculate the structure of the e2 loop in D2R. Here, the crystal structure is not known, but it is shown that several side chains that are involved in interaction with a class of substituted benzamides assume conformations that point into the active site. Thus, they are poised to interact with the incoming ligand.
Collapse
Affiliation(s)
- Sandhya Kortagere
- Department of Physiology and Biophysics, Weill-Cornell Medical College, 1300 York Avenue, New York, NY 10021, USA
| | | | | |
Collapse
|
15
|
Fernandez-Fuentes N, Querol E, Aviles FX, Sternberg MJE, Oliva B. Prediction of the conformation and geometry of loops in globular proteins: testing ArchDB, a structural classification of loops. Proteins 2006; 60:746-57. [PMID: 16021623 DOI: 10.1002/prot.20516] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Institute of Biomedicine and Biotechnology, Universitat Autonoma de Barcelona, Bellaterra, Barcelona, Spain
| | | | | | | | | |
Collapse
|
16
|
Reichelt J, Dieterich G, Kvesic M, Schomburg D, Heinz DW. BRAGI: linking and visualization of database information in a 3D viewer and modeling tool. Bioinformatics 2004; 21:1291-3. [PMID: 15546941 DOI: 10.1093/bioinformatics/bti138] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
BRAGI is a well-established package for viewing and modeling of three-dimensional (3D) structures of biological macromolecules. A new version of BRAGI has been developed that is supported on Windows, Linux and SGI. The user interface has been rewritten to give the standard 'look and feel' of the chosen operating system and to provide a more intuitive, easier usage. A large number of new features have been added. Information from public databases such as SWISS-PROT, InterPro, DALI and OMIM can be displayed in the 3D viewer. Structures can be searched for homologous sequences using the NCBI BLAST server.
Collapse
Affiliation(s)
- Joachim Reichelt
- Division of Structural Biology, German Research Centre for Biotechnology (GBF) Mascheroder Weg 1, D-38124, Braunschweig, Germany.
| | | | | | | | | |
Collapse
|
17
|
Heuser P, Wohlfahrt G, Schomburg D. Efficient methods for filtering and ranking fragments for the prediction of structurally variable regions in proteins. Proteins 2004; 54:583-95. [PMID: 14748005 DOI: 10.1002/prot.10603] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
The prediction of protein 3D structures close to insertions and deletions or, more generally, loop prediction, is still one of the major challenges in homology modeling projects. In this article, we developed ranking criteria and selection filters to improve knowledge-based loop predictions. These criteria were developed and optimized for a test data set containing 678 insertions and deletions. The examples are, in principle, predictable from the used loop database with an RMSD < 1 A and represent realistic modeling situations. Four noncorrelated criteria for the selection of fragments are evaluated. A fast prefilter compares the distance between the anchor groups in the template protein with the stems of the fragments. The RMSD of the anchor groups is used for fitting and ranking of the selected loop candidates. After fitting, repulsive close contacts of loop candidates with the template protein are used for filtering, and fragments with backbone torsion angles, which are unfavorable according to a knowledge-based potential, are eliminated. By the combined application of these filter criteria to the test set, it was possible to increase the percentage of predictions with a global RMSD < 1 A to over 50% among the first five ranks, with average global RMSD values for the first rank candidate that are between 1.3 and 2.2 A for different loop lengths. Compared to other examples described in the literature, our large numbers of test cases are not self-predictions, where loops are placed in a protein after a peptide loop has been cut out, but are attempts to predict structural changes that occur in evolution when a protein is affected by insertions and deletions.
Collapse
Affiliation(s)
- Philipp Heuser
- University of Cologne, Institute of Biochemistry, Köln, Germany
| | | | | |
Collapse
|
18
|
Wohlfahrt G, Hangoc V, Schomburg D. Positioning of anchor groups in protein loop prediction: the importance of solvent accessibility and secondary structure elements. Proteins 2002; 47:370-8. [PMID: 11948790 DOI: 10.1002/prot.10098] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
The prediction of loop regions in the process of protein structure prediction by homology is still an unsolved problem. In an earlier publication, we could show that the correct placement of the amino acids serving as an anchor group to be connected by a loop fragment with a predicted geometry is a highly important step and an essential requirement within the process (Lessel and Schomburg, Proteins 1999; 37:56-64). In this article, we present an analysis of the quality of possible loop predictions with respect to gap length, fragment length, amino acid type, secondary structure, and solvent accessibility. For 550 insertions and 544 deletions, we test all possible positions for anchor groups with an inserted loop of a length between 3 and 12 amino acids. We could show that approximately 80% of the indel regions could be predicted within 1.5 A RMSD from a knowledge-based loop data base if criteria for the correct localization of anchor groups could be found and the loops can be sorted correctly. From our analysis, several conclusions regarding the optimal placement of anchor groups become obvious: (1) The correct placement of anchor groups is even more important for longer gap lengths, (2) medium length fragments (length 5-8) perform better than short or long ones, (3) the placement of anchor groups at hydrophobic amino acids gives a higher chance to include the best possible loop, (4) anchor groups within secondary structure elements, in particular beta-sheets are suitable, (5) amino acids with lower solvent accessibility are better anchor group. A preliminary test using a combination of the anchor group positioning criteria deduced from our analysis shows very promising results.
Collapse
Affiliation(s)
- Gerd Wohlfahrt
- University of Cologne, Institute of Biochemistry, Köln, Germany.
| | | | | |
Collapse
|
19
|
Abstract
This study presents different procedures for ab initio modeling of peptide loops of different sizes in proteins. Small loops (up to 8--12 residues) were generated by a straightforward procedure with subsequent "averaging" over all the low-energy conformers obtained. The averaged conformer fairly represents the entire set of low-energy conformers, root mean square deviation (RMSD) values being from 1.01 A for a 4-residue loop to 1.94 A for an 8-residue loop. Three-dimensional (3D) structures for several medium loops (20--30 residues) and for two large loops (54 and 61 residues) were predicted using residue-residue contact matrices divided into variable parts corresponding to the loops, and into a constant part corresponding to the known core of the protein. For each medium loop, a very limited number of sterically reasonable C(alpha) traces (from 1 to 3) was found; RMSD values ranged from 2.4 to 5.9 A. Single C(alpha) traces predicted for each of the large loops possessed RMSD values of 4.5 A. Generally, ab initio loop modeling presented in this work combines elements of computational procedures developed both for protein folding and for peptide conformational analysis.
Collapse
Affiliation(s)
- S Galaktionov
- Department of Biochemistry and Molecular Biophysics, Washington University, Campus Box 8036, St. Louis, MO 63110, USA
| | | | | |
Collapse
|
20
|
D'Alfonso G, Tramontano A, Lahm A. Structural conservation in single-domain proteins: implications for homology modeling. J Struct Biol 2001; 134:246-56. [PMID: 11551183 DOI: 10.1006/jsbi.2001.4351] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Large-scale sequencing projects are widening the gap between the known protein universe and the fraction for which structural information has been experimentally obtained. Through the application of homology (comparative) modeling and more general structure prediction techniques, this gap can, however, be narrowed, providing indirect structural information for a considerable number of proteins. Moreover, the estimated number of existing protein folds seems to be limited and many of these yet unknown folds should be discovered by dedicated large-scale structural genomics projects. Within this perspective, homology (comparative) modeling will gain in importance, as will the use of models derived by this technique. Here we discuss how well a sequence alignment, the most common starting point for generating a model, reflects the structural conservation between homologous proteins and we show that sequence information is able to direct construction of acceptable models as far as the structural core is concerned. We also show here that the regions surrounding insertions and deletions are much less conserved than the core and discuss the implications of this observation for loop modeling.
Collapse
|
21
|
Abstract
The prediction of protein structure, based primarily on sequence and structure homology, has become an increasingly important activity. Homology models have become more accurate and their range of applicability has increased. Progress has come, in part, from the flood of sequence and structure information that has appeared over the past few years, and also from improvements in analysis tools. These include profile methods for sequence searches, the use of three-dimensional structure information in sequence alignment and new homology modeling tools, specifically in the prediction of loop and side-chain conformations. There have also been important advances in understanding the physical chemical basis of protein stability and the corresponding use of physical chemical potential functions to identify correctly folded from incorrectly folded protein conformations.
Collapse
Affiliation(s)
- B Al-Lazikani
- Department of Biochemistry and Molecular Biophysics, Howard Hughes Medical Institute, Columbia University, 630 West 168th Street, New York, NY 10032, USA
| | | | | | | |
Collapse
|