51
|
Zou D, He Z, He J. Beta-hairpin prediction with quadratic discriminant analysis using diversity measure. J Comput Chem 2009; 30:2277-84. [PMID: 19263434 DOI: 10.1002/jcc.21229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
On the basis of the features of protein sequential pattern, we used the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to predict beta-hairpins motifs in protein sequences. Three rules are used to extract the raw beta-beta motifs sequential patterns for fixed-length. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are combined to represent the compositional features. Eighteen feature variables on a sequential pattern to be predicted are defined in terms of ID. They are integrated in a single formal framework given by IDQD. The method is trained and tested on ArchDB40 dataset containing 3088 proteins. The overall accuracy of prediction and Matthew's correlation coefficient for the independent testing dataset are 81.7% and 0.60, respectively. In addition, a higher accuracy of 84.5% and Matthew's correlation coefficient of 0.68 for the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nucleic Acids Res 2005, 33, 154), which contains 2088 proteins. For a fair assessment of our method, the performance is also evaluated on all 63 proteins used in CASP6. The overall accuracy of prediction is 74.2% for the independent testing dataset.
Collapse
Affiliation(s)
- Dongsheng Zou
- College of Computer Science, Chongqing University, Chongqing 400044, China.
| | | | | |
Collapse
|
52
|
|
53
|
Hvidsten TR, Kryshtafovych A, Fidelis K. Local descriptors of protein structure: a systematic analysis of the sequence-structure relationship in proteins using short- and long-range interactions. Proteins 2009; 75:870-84. [PMID: 19025980 DOI: 10.1002/prot.22296] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Local protein structure representations that incorporate long-range contacts between residues are often considered in protein structure comparison but have found relatively little use in structure prediction where assembly from single backbone fragments dominates. Here, we introduce the concept of local descriptors of protein structure to characterize local neighborhoods of amino acids including short- and long-range interactions. We build a library of recurring local descriptors and show that this library is general enough to allow assembly of unseen protein structures. The library could on average re-assemble 83% of 119 unseen structures, and showed little or no performance decrease between homologous targets and targets with folds not represented among domains used to build it. We then systematically evaluate the descriptor library to establish the level of the sequence signal in sets of protein fragments of similar geometrical conformation. In particular, we test whether that signal is strong enough to facilitate correct assignment and alignment of these local geometries to new sequences. We use the signal to assign descriptors to a test set of 479 sequences with less than 40% sequence identity to any domain used to build the library, and show that on average more than 50% of the backbone fragments constituting descriptors can be correctly aligned. We also use the assigned descriptors to infer SCOP folds, and show that correct predictions can be made in many of the 151 cases where PSI-BLAST was unable to detect significant sequence similarity to proteins in the library. Although the combinatorial problem of simultaneously aligning several fragments to sequence is a major bottleneck compared with single fragment methods, the advantage of the current approach is that correct alignments imply correct long range distance constraints. The lack of these constraints is most likely the major reason why structure prediction methods fail to consistently produce adequate models when good templates are unavailable or undetectable. Thus, we believe that the current study offers new and valuable insight into the prediction of sequence-structure relationships in proteins.
Collapse
|
54
|
Liu P, Zhu F, Rassokhin DN, Agrafiotis DK. A self-organizing algorithm for modeling protein loops. PLoS Comput Biol 2009; 5:e1000478. [PMID: 19696883 PMCID: PMC2719875 DOI: 10.1371/journal.pcbi.1000478] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2009] [Accepted: 07/20/2009] [Indexed: 11/19/2022] Open
Abstract
Protein loops, the flexible short segments connecting two stable secondary
structural units in proteins, play a critical role in protein structure and
function. Constructing chemically sensible conformations of protein loops that
seamlessly bridge the gap between the anchor points without introducing any
steric collisions remains an open challenge. A variety of algorithms have been
developed to tackle the loop closure problem, ranging from inverse kinematics to
knowledge-based approaches that utilize pre-existing fragments extracted from
known protein structures. However, many of these approaches focus on the
generation of conformations that mainly satisfy the fixed end point condition,
leaving the steric constraints to be resolved in subsequent post-processing
steps. In the present work, we describe a simple solution that simultaneously
satisfies not only the end point and steric conditions, but also chirality and
planarity constraints. Starting from random initial atomic coordinates, each
individual conformation is generated independently by using a simple alternating
scheme of pairwise distance adjustments of randomly chosen atoms, followed by
fast geometric matching of the conformationally rigid components of the
constituent amino acids. The method is conceptually simple, numerically stable
and computationally efficient. Very importantly, additional constraints, such as
those derived from NMR experiments, hydrogen bonds or salt bridges, can be
incorporated into the algorithm in a straightforward and inexpensive way, making
the method ideal for solving more complex multi-loop problems. The remarkable
performance and robustness of the algorithm are demonstrated on a set of protein
loops of length 4, 8, and 12 that have been used in previous studies. Protein loops play an important role in protein function, such as ligand binding,
recognition, and allosteric regulation. However, due to their flexibility, it is
notoriously difficult to determine their 3D structures using traditional
experimental techniques. As a result, one can often find protein structures with
missing loops in the Protein Data Bank. Their sequence variability also presents
a particular challenge for homology modeling methods, which can only yield good
overall structures given sufficient sequence identity and good experimental
reference structures. Despite extensive research, the construction of protein
loop 3D structures remains an open problem, since a sensible conformation should
seamlessly bridge the anchor points without introducing steric clashes within
the loop itself or between the loop and its surroundings environment. Here, we
present a conceptually simple, mathematically straightforward, numerically
robust and computationally efficient approach for building protein loop
conformations that simultaneously satisfy end-point, steric, planar and chiral
constraints. More importantly, additional constraints derived from experimental
sources can be incorporated in a straightforward manner, allowing the processing
of more complex structures involving multiple interlocking loops.
Collapse
Affiliation(s)
- Pu Liu
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
- * E-mail: (PL); (DKA)
| | - Fangqiang Zhu
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
| | - Dmitrii N. Rassokhin
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
| | - Dimitris K. Agrafiotis
- Johnson & Johnson Pharmaceutical Research and Development, Exton,
Pennsylvania, United States of America
- * E-mail: (PL); (DKA)
| |
Collapse
|
55
|
Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 2009; 38:915-21. [DOI: 10.1007/s00726-009-0299-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Accepted: 04/20/2009] [Indexed: 10/20/2022]
|
56
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using MODELLER. ACTA ACUST UNITED AC 2008; Chapter 2:Unit 2.9. [PMID: 18429317 DOI: 10.1002/0471140864.ps0209s50] [Citation(s) in RCA: 754] [Impact Index Per Article: 47.1] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Functional characterization of a protein sequence is a common goal in biology, and is usually facilitated by having an accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco, San Francisco, California, USA
| | | | | | | | | | | | | | | |
Collapse
|
57
|
Olson MA, Feig M, Brooks CL. Prediction of protein loop conformations using multiscale modeling methods with physical energy scoring functions. J Comput Chem 2008; 29:820-31. [PMID: 17876760 DOI: 10.1002/jcc.20827] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
This article examines ab initio methods for the prediction of protein loops by a computational strategy of multiscale conformational sampling and physical energy scoring functions. Our approach consists of initial sampling of loop conformations from lattice-based low-resolution models followed by refinement using all-atom simulations. To allow enhanced conformational sampling, the replica exchange method was implemented. Physical energy functions based on CHARMM19 and CHARMM22 parameterizations with generalized Born (GB) solvent models were applied in scoring loop conformations extracted from the lattice simulations and, in the case of all-atom simulations, the ensemble of conformations were generated and scored with these models. Predictions are reported for 25 loop segments, each eight residues long and taken from a diverse set of 22 protein structures. We find that the simulations generally sampled conformations with low global root-mean-square-deviation (RMSD) for loop backbone coordinates from the known structures, whereas clustering conformations in RMSD space and scoring detected less favorable loop structures. Specifically, the lattice simulations sampled basins that exhibited an average global RMSD of 2.21 +/- 1.42 A, whereas clustering and scoring the loop conformations determined an RMSD of 3.72 +/- 1.91 A. Using CHARMM19/GB to refine the lattice conformations improved the sampling RMSD to 1.57 +/- 0.98 A and detection to 2.58 +/- 1.48 A. We found that further improvement could be gained from extending the upper temperature in the all-atom refinement from 400 to 800 K, where the results typically yield a reduction of approximately 1 A or greater in the RMSD of the detected loop. Overall, CHARMM19 with a simple pairwise GB solvent model is more efficient at sampling low-RMSD loop basins than CHARMM22 with a higher-resolution modified analytical GB model; however, the latter simulation method provides a more accurate description of the all-atom energy surface, yet demands a much greater computational cost.
Collapse
Affiliation(s)
- Mark A Olson
- Department of Cell Biology and Biochemistry, U.S. Army Medical Research Institute of Infectious Diseases, Frederick, Maryland 21702, USA.
| | | | | |
Collapse
|
58
|
Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, Shen MY, Pieper U, Sali A. Comparative protein structure modeling using Modeller. ACTA ACUST UNITED AC 2008; Chapter 5:Unit-5.6. [PMID: 18428767 DOI: 10.1002/0471250953.bi0506s15] [Citation(s) in RCA: 1766] [Impact Index Per Article: 110.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Functional characterization of a protein sequence is one of the most frequent problems in biology. This task is usually facilitated by accurate three-dimensional (3-D) structure of the studied protein. In the absence of an experimentally determined structure, comparative or homology modeling can sometimes provide a useful 3-D model for a protein that is related to at least one known protein structure. Comparative modeling predicts the 3-D structure of a given protein sequence (target) based primarily on its alignment to one or more proteins of known structure (templates). The prediction process consists of fold assignment, target-template alignment, model building, and model evaluation. This unit describes how to calculate comparative models using the program MODELLER and discusses all four steps of comparative modeling, frequently observed errors, and some applications. Modeling lactate dehydrogenase from Trichomonas vaginalis (TvLDH) is described as an example. The download and installation of the MODELLER software is also described.
Collapse
Affiliation(s)
- Narayanan Eswar
- University of California at San Francisco San Francisco, California
| | - Ben Webb
- University of California at San Francisco San Francisco, California
| | | | - M S Madhusudhan
- University of California at San Francisco San Francisco, California
| | - David Eramian
- University of California at San Francisco San Francisco, California
| | - Min-Yi Shen
- University of California at San Francisco San Francisco, California
| | - Ursula Pieper
- University of California at San Francisco San Francisco, California
| | - Andrej Sali
- University of California at San Francisco San Francisco, California
| |
Collapse
|
59
|
Hermoso A, Espadaler J, Enrique Querol E, Aviles FX, Sternberg MJ, Oliva B, Fernandez-Fuentes N. Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB). Bioinform Biol Insights 2008. [DOI: 10.1177/117793220700100004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Loops represent an important part of protein structures. The study of loop is critical for two main reasons: First, loops are often involved in protein function, stability and folding. Second, despite improvements in experimental and computational structure prediction methods, modeling the conformation of loops remains problematic. Here, we present a structural classification of loops, ArchDB, a mine of information with application in both mentioned fields: loop structure prediction and function prediction. ArchDB ( http://sbi.imim.es/archdb ) is a database of classified protein loop motifs. The current database provides four different classification sets tailored for different purposes. ArchDB-40, a loop classification derived from SCOP40, well suited for modeling common loop motifs. Since features relevant to loop structure or function can be more easily determined on well-populated clusters, we have developed ArchDB-95, a loop classification derived from SCOP95. This new classification set shows a ~40% increase in the number of subclasses, and a large 7-fold increase in the number of putative structure/function-related subclasses. We also present ArchDB-EC, a classification of loop motifs from enzymes, and ArchDB-KI, a manually annotated classification of loop motifs from kinases. Information about ligand contacts and PDB sites has been included in all classification sets. Improvements in our classification scheme are described, as well as several new database features, such as the ability to query by conserved annotations, sequence similarity, or uploading 3D coordinates of a protein. The lengths of classified loops range between 0 and 36 residues long. ArchDB offers an exhaustive sampling of loop structures. Functional information about loops and links with related biological databases are also provided. All this information and the possibility to browse/query the database through a web-server outline an useful tool with application in the comparative study of loops, the analysis of loops involved in protein function and to obtain templates for loop modeling.
Collapse
Affiliation(s)
- Antoni Hermoso
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Jordi Espadaler
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - E Enrique Querol
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Francesc X. Aviles
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Michael J.E. Sternberg
- Structural Bioinformatics Group, Department of Biological Sciences, Imperial College, London SW7 2AZ, U.K
| | - Baldomero Oliva
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, St. James University Hospital, Leeds LS7 9TF. U.K
| |
Collapse
|
60
|
|
61
|
Prakash T, Sandhu KS, Singh NK, Bhasin Y, Ramakrishnan C, Brahmachari SK. Structural assessment of glycyl mutations in invariantly conserved motifs. Proteins 2007; 69:617-32. [PMID: 17623846 DOI: 10.1002/prot.21488] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Motifs that are evolutionarily conserved in proteins are crucial to their structure and function. In one of our earlier studies, we demonstrated that the conserved motifs occurring invariantly across several organisms could act as structural determinants of the proteins. We observed the abundance of glycyl residues in these invariantly conserved motifs. The role of glycyl residues in highly conserved motifs has not been studied extensively. Thus, it would be interesting to examine the structural perturbations induced by mutation in these conserved glycyl sites. In this work, we selected a representative set of invariant signature (IS) peptides for which both the PDB structure and mutation information was available. We thoroughly analyzed the conformational features of the glycyl sites and their local interactions with the surrounding residues. Using Ramachandran angles, we showed that the glycyl residues occurring in these IS peptides, which have undergone mutation, occurred more often in the L-disallowed as compared with the L-allowed region of the Ramachandran plot. Short range contacts around the mutation site were analyzed to study the steric effects. With the results obtained from our analysis, we hypothesize that any change of activity arising because of such mutations must be attributed to the long-range interaction(s) of the new residue if the glycyl residue in the IS peptide occurred in the L-allowed region of the Ramachandran plot. However, the mutation of those conserved glycyl residues that occurred in the L-disallowed region of the Ramachandran plot might lead to an altered activity of the protein as a result of an altered conformation of the backbone in the immediate vicinity of the glycyl residue, in addition to long range effects arising from the long side chains of the new residue. Thus, the loss of activity because of mutation in the conserved glycyl site might either relate to long range interactions or to local perturbations around the site depending upon the conformational preference of the glycyl residue.
Collapse
Affiliation(s)
- Tulika Prakash
- G. N. Ramachandran Knowledge Center for Genome Informatics, Institute of Genomics and Integrative Biology, Delhi 110007, India
| | | | | | | | | | | |
Collapse
|
62
|
De Brevern AG, Etchebest C, Benros C, Hazout S. "Pinning strategy": a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 2007; 32:51-70. [PMID: 17426380 DOI: 10.1007/s12038-007-0006-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 C alpha in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs)are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called "pinning strategy" that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.
Collapse
Affiliation(s)
- A G De Brevern
- 1 INSERM, U726, Equipe de Bioinformatique Genomique et Moleculaire (EBGM), Universite Paris 7,case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France.
| | | | | | | |
Collapse
|
63
|
Peng HP, Yang AS. Modeling protein loops with knowledge-based prediction of sequence-structure alignment. Bioinformatics 2007; 23:2836-42. [PMID: 17827204 DOI: 10.1093/bioinformatics/btm456] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY http://cmb.genomics.sinica.edu.tw
Collapse
Affiliation(s)
- Hung-Pin Peng
- Genomics Research Center, Academia Sinica. 128 Academia Road, Section 2, Nankang District, Taipei 115, Taiwan, ROC
| | | |
Collapse
|
64
|
Kanagasabai V, Arunachalam J, Prasad PA, Gautham N. Exploring the conformational space of protein loops using a mean field technique with MOLS sampling. Proteins 2007; 67:908-21. [PMID: 17357159 DOI: 10.1002/prot.21333] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
We have recently developed a computational technique that uses mutually orthogonal Latin square sampling to explore the conformational space of oligopeptides in an exhaustive manner. In this article, we report its use to analyze the conformational spaces of 120 protein loop sequences in proteins, culled from the PDB, having the length ranging from 5 to 10 residues. The force field used did not have any information regarding the sequences or structures that flanked the loop. The results of the analyses show that the native structure of the loop, as found in the PDB falls at one of the low energy points in the conformational landscape of the sequences. Thus, a large portion of the structural determinants of the loop may be considered intrinsic to the sequence, regardless of either adjacent sequences or structures, or the interactions that the atoms of the loop make with other residues in the protein or in neighboring proteins.
Collapse
Affiliation(s)
- V Kanagasabai
- Department of Crystallography and Biophysics, University of Madras, Chennai 600 025, India
| | | | | | | |
Collapse
|
65
|
Abstract
In this perspective, we begin by describing the comparative protein structure modeling technique and the accuracy of the corresponding models. We then discuss the significant role that comparative prediction plays in drug discovery. We focus on virtual ligand screening against comparative models and illustrate the state of the art by a number of specific examples.
Collapse
|
66
|
Abstract
ArchPRED server (http://www.fiserlab.org/servers/archpred) implements a novel fragment-search based method for predicting loop conformations. The inputs to the server are the atomic coordinates of the query protein and the position of the loop. The algorithm selects candidate loop fragments from a regularly updated loop library (Search Space) by matching the length, the types of bracing secondary structures of the query and by satisfying the geometrical restraints imposed by the stem residues. Subsequently, candidate loops are inserted in the query protein framework where their side chains are rebuilt and their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z-score that combines information on sequence similarity and fit of predicted and observed [/psi] main chain dihedral angle propensities. The final loop conformation is built in the protein structure and annealed in the environment using conjugate gradient minimization. The prediction method was benchmarked on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it was possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 A of r.m.s.d. accuracy, respectively. In a head to head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a approximately 5:1 ratio an earlier developed database search method.
Collapse
Affiliation(s)
| | | | - András Fiser
- To whom correspondence should be addressed. Tel: +1 718 430 3233; Fax: +1 718 430 856;
| |
Collapse
|
67
|
Espadaler J, Querol E, Aviles FX, Oliva B. Identification of function-associated loop motifs and application to protein function prediction. Bioinformatics 2006; 22:2237-43. [PMID: 16870939 DOI: 10.1093/bioinformatics/btl382] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. RESULTS We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. CONTACT boliva@imim.es SUPPLEMENTARY INFORMATION Supplementary Data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jordi Espadaler
- Group de Bioinformàtica Estructural (GRIB-IMIM), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra 08003 Barcelona, Catalonia, Spain
| | | | | | | |
Collapse
|
68
|
Hennetin J, Jullian B, Steven AC, Kajava AV. Standard Conformations of β-Arches in β-Solenoid Proteins. J Mol Biol 2006; 358:1094-105. [PMID: 16580019 DOI: 10.1016/j.jmb.2006.02.039] [Citation(s) in RCA: 54] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2005] [Revised: 02/13/2006] [Accepted: 02/15/2006] [Indexed: 11/15/2022]
Abstract
Strand-turn-strand motifs found in beta-helical (more generally, beta-solenoid) proteins differ fundamentally from those found in globular proteins. The latter are primarily beta-hairpins in which the two strands form an antiparallel beta-sheet. In the former, the two strands are relatively rotated by approximately 90 degrees around the strand axes so that they interact via the side-chains, not via the polypeptide backbones. We call the latter structures, beta-arches, and their turns, beta-arcs. In beta-solenoid proteins, beta-arches stack in-register to form beta-arcades in which parallel beta-sheets are assembled from corresponding strands in successive layers. The number of beta-solenoids whose three-dimensional structures have been determined is now large enough to support a detailed analysis and classification of beta-arc conformations. Here, we present a systematic account of beta-arcs distinguished by the number of residues, their conformations, and their propensity to stack into arcades with other like or unlike arches. The trends to emerge from this analysis have implications for sequence-based detection and structural prediction of other beta-solenoid proteins as well as for identification of amyloidogenic sequences and elucidation of amyloid fibril structures.
Collapse
Affiliation(s)
- Jérôme Hennetin
- Centre de Recherches de Biochimie Macromoléculaire, CNRS FRE-2593, 1919 Route de Mende, 34293 Montpellier Cedex 5, France
| | | | | | | |
Collapse
|
69
|
Fernandez-Fuentes N, Oliva B, Fiser A. A supersecondary structure library and search algorithm for modeling loops in protein structures. Nucleic Acids Res 2006; 34:2085-97. [PMID: 16617149 PMCID: PMC1440879 DOI: 10.1093/nar/gkl156] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105 950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z-score that combines information on sequence similarity and fit of predicted and observed ϕ/ψ main chain dihedral angle propensities. Confidence Z-score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.
Collapse
Affiliation(s)
| | - Baldomero Oliva
- Structural Bioinformatics Group (GRIB), Universitat Pompeu FabraC/Doctor Aiguader,80. 08003, Barcelona, Catalonia, Spain
| | - András Fiser
- To whom correspondence should be addressed. Tel: +1 718 430 3233; Fax: +1 718 430 856;
| |
Collapse
|
70
|
Benros C, de Brevern AG, Etchebest C, Hazout S. Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 2006; 62:865-80. [PMID: 16385557 DOI: 10.1002/prot.20815] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 A Calpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 A from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
Collapse
Affiliation(s)
- Cristina Benros
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris 7, Paris, France.
| | | | | | | |
Collapse
|
71
|
Carrega L, Mosbah A, Ferrat G, Beeton C, Andreotti N, Mansuelle P, Darbon H, De Waard M, Sabatier JM. The impact of the fourth disulfide bridge in scorpion toxins of the alpha-KTx6 subfamily. Proteins 2006; 61:1010-23. [PMID: 16247791 DOI: 10.1002/prot.20681] [Citation(s) in RCA: 17] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Animal toxins are highly reticulated and structured polypeptides that adopt a limited number of folds. In scorpion species, the most represented fold is the alpha/beta scaffold in which an helical structure is connected to an antiparallel beta-sheet by two disulfide bridges. The intimate relationship existing between peptide reticulation and folding remains poorly understood. Here, we investigated the role of disulfide bridging on the 3D structure of HsTx1, a scorpion toxin potently active on Kv1.1 and Kv1.3 channels. This toxin folds along the classical alpha/beta scaffold but belongs to a unique family of short-chain, four disulfide-bridged toxins. Removal of the fourth disulfide bridge of HsTx1 does not affect its helical structure, whereas its two-stranded beta-sheet is altered from a twisted to a nontwisted configuration. This structural change in HsTx1 is accompanied by a marked decrease in Kv1.1 and Kv1.3 current blockage, and by alterations in the toxin to channel molecular contacts. In contrast, a similar removal of the fourth disulfide bridge of Pi1, another scorpion toxin from the same structural family, has no impact on its 3D structure, pharmacology, or channel interaction. These data highlight the importance of disulfide bridging in reaching the correct bioactive conformation of some toxins.
Collapse
Affiliation(s)
- Louis Carrega
- Laboratoire d'Ingénierie des Protéines, CNRS FRE 2738, IFR Jean Roche, Faculté de Médecine Nord, Marseille Cedex, France
| | | | | | | | | | | | | | | | | |
Collapse
|
72
|
Fernandez-Fuentes N, Querol E, Aviles FX, Sternberg MJE, Oliva B. Prediction of the conformation and geometry of loops in globular proteins: testing ArchDB, a structural classification of loops. Proteins 2006; 60:746-57. [PMID: 16021623 DOI: 10.1002/prot.20516] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Institute of Biomedicine and Biotechnology, Universitat Autonoma de Barcelona, Bellaterra, Barcelona, Spain
| | | | | | | | | |
Collapse
|
73
|
Szarecka A, Meirovitch H. Optimization of the GB/SA solvation model for predicting the structure of surface loops in proteins. J Phys Chem B 2006; 110:2869-80. [PMID: 16471897 PMCID: PMC1945207 DOI: 10.1021/jp055771+] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Implicit solvation models are commonly optimized with respect to experimental data or Poisson-Boltzmann (PB) results obtained for small molecules, where the force field is sometimes not considered. In previous studies, we have developed an optimization procedure for cyclic peptides and surface loops in proteins based on the entire system studied and the specific force field used. Thus, the loop has been modeled by the simplified solvation function E(tot) = E(FF) (epsilon = 2r) + Sigma(i) sigma(i)A(i), where E(FF) (epsilon = nr) is the AMBER force field energy with a distance-dependent dielectric function, epsilon = nr, A(i) is the solvent accessible surface area of atom i, and sigma(i) is its atomic solvation parameter. During the optimization process, the loop is free to move while the protein template is held fixed in its X-ray structure. To improve on the results of this model, in the present work we apply our optimization procedure to the physically more rigorous solvation model, the generalized Born with surface area (GB/SA) (together with the all-atom AMBER force field) as suggested by Still and co-workers (J. Phys. Chem. A 1997, 101, 3005). The six parameters of the GB/SA model, namely, P(1)-P(5) and the surface area parameter, sigma (programmed in the TINKER package) are reoptimized for a "training" group of nine loops, and a best-fit set is defined from the individual sets of optimized parameters. The best-fit set and Still's original set of parameters (where Lys, Arg, His, Glu, and Asp are charged or neutralized) were applied to the training group as well as to a "test" group of seven loops, and the energy gaps and the corresponding RMSD values were calculated. These GB/SA results based on the three sets of parameters have been found to be comparable; surprisingly, however, they are somewhat inferior (e.g, of larger energy gaps) to those obtained previously from the simplified model described above. We discuss recent results for loops obtained by other solvation models and potential directions for future studies.
Collapse
Affiliation(s)
- Agnieszka Szarecka
- Department of Computational Biology, University of Pittsburgh School of Medicine, Suite 3064, BST 3, 3501 Fifth Avenue, Pittsburgh, PA 15213
| | - Hagai Meirovitch
- Department of Computational Biology, University of Pittsburgh School of Medicine, Suite 3064, BST 3, 3501 Fifth Avenue, Pittsburgh, PA 15213
| |
Collapse
|
74
|
White RP, Meirovitch H. Minimalist explicit solvation models for surface loops in proteins. J Chem Theory Comput 2006; 2:1135-1151. [PMID: 17429495 PMCID: PMC1851699 DOI: 10.1021/ct0503217] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We have performed molecular dynamics simulations of protein surface loops solvated by explicit water, where a prime focus of the study is the small numbers (e.g., ~100) of explicit water molecules employed. The models include only part of the protein (typically 500 - 1000 atoms), and the water molecules are restricted to a region surrounding the loop. In this study, the number of water molecules (N(w)) is systematically varied, and convergence with large N(w) is monitored to reveal N(w)(min), the minimum number required for the loop to exhibit realistic (fully hydrated) behavior. We have also studied protein surface coverage, as well as diffusion and residence times for water molecules as a function of N(w). A number of other modeling parameters are also tested. These include the number of environmental protein atoms explicitly considered in the model, as well as two ways to constrain the water molecules to the vicinity of the loop (where we find one of these methods to perform better when N(w) is small). The results (for RMSD and its fluctuations for four loops) are further compared to much larger, fully solvated systems (using ~10,000 water molecules under periodic boundary conditions and Ewald electrostatics), and to results for the GBSA implicit solvation model. We find that the loop backbone can stabilize with a surprisingly small number of water molecules (as low as 5 molecules per amino acid residue). The side chains of the loop require somewhat larger N(w), where the atomic fluctuations become too small if N(w) is further reduced. Thus, in general, we find adequate hydration to occur at roughly 12 water molecules per residue. This is an important result, because at this hydration level, computational times are comparable to those required for GBSA. Therefore these "minimalist explicit models" can provide a viable and potentially more accurate alternative. The importance of protein loop modeling is discussed in the context of these, and other, loop models, along with other challenges including the relevance of appropriate free energy simulation methodology for assessment of conformational stability.
Collapse
Affiliation(s)
- Ronald P. White
- Department of Computational Biology, University of Pittsburgh School of Medicine, Biomedical Science Tower3, 3064 Pittsburgh, PA 15260
| | - Hagai Meirovitch
- Department of Computational Biology, University of Pittsburgh School of Medicine, Biomedical Science Tower3, 3064 Pittsburgh, PA 15260
| |
Collapse
|
75
|
Abstract
The field of protein-structure prediction has been revolutionized by the application of "mix-and-match" methods both in template-based homology modeling and in template-free de novo folding. Consensus analysis and recombination of fragments copied from known protein structures is currently the only approach that allows the building of models that are closer to the native structure of the target protein than the structure of its closest homologue. It is also the most successful approach in cases in which the target protein exhibits a novel three-dimensional fold. This review summarizes the recent developments in both template-based and template-free protein structure modeling and compares the available methods for protein-structure prediction by recombination of fragments. A convergence between the "protein folding" and "protein evolution" schools of thought is postulated.
Collapse
Affiliation(s)
- Janusz M Bujnicki
- Laboratory of Bioinformatics and Protein Engineering, International Institute of Molecular and Cell Biology, Trojdena 4, 02-109 Warsaw, Poland.
| |
Collapse
|
76
|
De S, Sur K, Dasgupta S. Characterization of the nonregular regions of proteins by a contortion index. Biopolymers 2005; 79:63-73. [PMID: 15962279 DOI: 10.1002/bip.20333] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Nonstructured regions in proteins that provide the link between two regular structured regions play a significant role in maintaining the scaffold of the protein. Not only do they act as connectors between two regular secondary structural elements of proteins but they also provide the necessary turn or reversal in the polypeptide chain. This incorporates flexibility in the structure. Thus an understanding of the structural aspects of the nonregular regions is necessary to have a better insight into these features. We can assume the nonregular region to be a contorted polypeptide segment tethered by regular secondary structured regions at both ends. To describe the undulating nature of the nonregular regions, we introduce a parameter called the "contortion index." This index describes how tortuously the region is organized. Our analysis shows that the contortion index is related to other physicochemical parameters and can be used to characterize the nonregular regions of proteins.
Collapse
Affiliation(s)
- Subhajyoti De
- Department of Chemistry, Indian Institute of Technology, Kharagpur 721 302, India
| | | | | |
Collapse
|
77
|
Tendulkar AV, Sohoni MA, Ogunnaike B, Wangikar PP. A geometric invariant-based framework for the analysis of protein conformational space. Bioinformatics 2005; 21:3622-8. [PMID: 16096349 DOI: 10.1093/bioinformatics/bti621] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Characterization of the restricted nature of the protein local conformational space has remained a challenge, thereby necessitating a computationally expensive conformational search in protein modeling. Moreover, owing to the lack of unilateral structural descriptors, conventional data mining techniques, such as clustering and classification, have not been applied in protein structure analysis. RESULTS We first map the local conformations in a fixed dimensional space by using a carefully selected suite of geometric invariants (GIs) and then reduce the number of dimensions via principal component analysis (PCA). Distribution of the conformations in the space spanned by the first four PCs is visualized as a set of conditional bivariate probability distribution plots, where the peaks correspond to the preferred conformations. The locations of the different canonical structures in the PC-space have been interpreted in the context of the weights of the GIs to the first four PCs. Clustering of the available conformations reveals that the number of preferred local conformations is several orders of magnitude smaller than that suggested previously. SUPPLEMENTARY INFORMATION www.it.iitb.ac.in/~ashish/bioinfo2005/.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Kanwal Rekhi School of Information Technology, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | | | | | | |
Collapse
|
78
|
Lee MC, Deng J, Briggs JM, Duan Y. Large-scale conformational dynamics of the HIV-1 integrase core domain and its catalytic loop mutants. Biophys J 2005; 88:3133-46. [PMID: 15731379 PMCID: PMC1305464 DOI: 10.1529/biophysj.104.058446] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
HIV-1 integrase is one of the three essential enzymes required for viral replication and has great potential as a novel target for anti-HIV drugs. Although tremendous efforts have been devoted to understanding this protein, the conformation of the catalytic core domain around the active site, particularly the catalytic loop overhanging the active site, is still not well characterized by experimental methods due to its high degree of flexibility. Recent studies have suggested that this conformational dynamics is directly correlated with enzymatic activity, but the details of this dynamics is not known. In this study, we conducted a series of extended-time molecular dynamics simulations and locally enhanced sampling simulations of the wild-type and three loop hinge mutants to investigate the conformational dynamics of the core domain. A combined total of >480 ns of simulation data was collected which allowed us to study the conformational changes that were not possible to observe in the previously reported short-time molecular dynamics simulations. Among the main findings are a major conformational change (>20 A) in the catalytic loop, which revealed a gatinglike dynamics, and a transient intraloop structure, which provided a rationale for the mutational effects of several residues on the loop including Q(148), P(145), and Y(143). Further, clustering analyses have identified seven major conformational states of the wild-type catalytic loop. Their implications for catalytic function and ligand interaction are discussed. The findings reported here provide a detailed view of the active site conformational dynamics and should be useful for structure-based inhibitor design for integrase.
Collapse
Affiliation(s)
- Matthew C Lee
- Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, USA
| | | | | | | |
Collapse
|
79
|
Panchenko AR, Madej T. Structural similarity of loops in protein families: toward the understanding of protein evolution. BMC Evol Biol 2005; 5:10. [PMID: 15691378 PMCID: PMC549550 DOI: 10.1186/1471-2148-5-10] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/06/2004] [Accepted: 02/03/2005] [Indexed: 11/16/2022] Open
Abstract
Background Protein evolution and protein classification are usually inferred by comparing protein cores in their conserved aligned parts. Structurally aligned protein regions are separated by less conserved loop regions, where sequence and structure locally deviate from each other and do not superimpose well. Results Our results indicate that even longer protein loops can not be viewed as "random coils" and for the majority of protein families in our test set there exists a linear correlation between the measures of sequence similarity and loop structural similarity. Results suggest that distance matrices derived from the loop (dis)similarity measure may produce in some cases more reliable cluster trees compared to the distance matrices based on the conventional measures of sequence and structural (dis)similarity. Conclusions We show that by considering "dissimilar" loop regions rather than only conserved core regions it is possible to improve our understanding of protein evolution.
Collapse
Affiliation(s)
- Anna R Panchenko
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| | - Thomas Madej
- Computational Biology Branch, National Center for Biotechnology Information, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
80
|
Rayan A, Senderowitz H, Goldblum A. Exploring the conformational space of cyclic peptides by a stochastic search method. J Mol Graph Model 2004; 22:319-33. [PMID: 15099829 DOI: 10.1016/j.jmgm.2003.12.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
A stochastic search algorithm is applied in order to probe the conformations of cyclic peptides. The search is conducted in two stages. In the first stage, random conformations are generated and evaluated by a penalty function for ring closure ability, following a stepwise construction of each amino acid into the peptide by a random choice of one of its allowed conformations. The allowed conformational ranges of backbone dihedral angles for each amino acid have been extracted from a Data Bank of diverse proteins. Values of dihedral angles that do not contribute favorably to the scoring of ring closure are retained or discarded by a statistical test. Values are discarded up to a point from which all remaining combinations of angles are constructed, scored, sorted, and clustered. In the second stage, side chains have been added and fast optimization was applied to the set of diverse conformations in a "united atoms" approach, with the "Kollman forcefield" of Sybyl 6.8. This iterative stochastic elimination algorithm finds the global minimum and most of the best results, when compared to a full exhaustive search in appropriately sized problems. In larger problems, we compare the results to experimental structures. The root mean square deviation (RMSD) of our best results compared to crystal structures of cyclic peptides with sizes from 4 to 15 amino acids are mostly below 1.0 A up to 8 mers and under 2.0 A for larger cyclic peptides.
Collapse
Affiliation(s)
- Anwar Rayan
- Department of Medicinal Chemistry and Natural Products, David R. Bloom Center for Pharmacy, School of Pharmacy, The Hebrew University of Jerusalem, Jerusalem 91120, Israel.
| | | | | |
Collapse
|
81
|
Fernandez-Fuentes N, Hermoso A, Espadaler J, Querol E, Aviles FX, Oliva B. Classification of common functional loops of kinase super-families. Proteins 2004; 56:539-55. [PMID: 15229886 DOI: 10.1002/prot.20136] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A structural classification of loops has been obtained from a set of 141 protein structures classified as kinases. A total of 1813 loops was classified into 133 subclasses (9 betabeta(links), 15 betabeta(hairpins), 31 alpha-alpha, 46 alpha-beta and 32 beta-alpha). Functional information and specific features relating subclasses and function were included in the classification. Functional loops such as the P-loop (shared by different folds) or the Gly-rich-loop, among others, were classified into structural motifs. As a result, a common mechanism of catalysis and substrate binding was proved for most kinases. Additionally, the multiple-alignment of loop sequences made within each subclass was shown to be useful for comparative modeling of kinase loops. The classification is summarized in a kinase loop database located at http://sbi.imim.es/archki.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Institut de Biotecnologia i Biomedicina and Department de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | | | | | | | | | | |
Collapse
|
82
|
Jacobson MP, Pincus DL, Rapp CS, Day TJF, Honig B, Shaw DE, Friesner RA. A hierarchical approach to all-atom protein loop prediction. Proteins 2004; 55:351-67. [PMID: 15048827 DOI: 10.1002/prot.10613] [Citation(s) in RCA: 1687] [Impact Index Per Article: 84.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
The application of all-atom force fields (and explicit or implicit solvent models) to protein homology-modeling tasks such as side-chain and loop prediction remains challenging both because of the expense of the individual energy calculations and because of the difficulty of sampling the rugged all-atom energy surface. Here we address this challenge for the problem of loop prediction through the development of numerous new algorithms, with an emphasis on multiscale and hierarchical techniques. As a first step in evaluating the performance of our loop prediction algorithm, we have applied it to the problem of reconstructing loops in native structures; we also explicitly include crystal packing to provide a fair comparison with crystal structures. In brief, large numbers of loops are generated by using a dihedral angle-based buildup procedure followed by iterative cycles of clustering, side-chain optimization, and complete energy minimization of selected loop structures. We evaluate this method by using the largest test set yet used for validation of a loop prediction method, with a total of 833 loops ranging from 4 to 12 residues in length. Average/median backbone root-mean-square deviations (RMSDs) to the native structures (superimposing the body of the protein, not the loop itself) are 0.42/0.24 A for 5 residue loops, 1.00/0.44 A for 8 residue loops, and 2.47/1.83 A for 11 residue loops. Median RMSDs are substantially lower than the averages because of a small number of outliers; the causes of these failures are examined in some detail, and many can be attributed to errors in assignment of protonation states of titratable residues, omission of ligands from the simulation, and, in a few cases, probable errors in the experimentally determined structures. When these obvious problems in the data sets are filtered out, average RMSDs to the native structures improve to 0.43 A for 5 residue loops, 0.84 A for 8 residue loops, and 1.63 A for 11 residue loops. In the vast majority of cases, the method locates energy minima that are lower than or equal to that of the minimized native loop, thus indicating that sampling rarely limits prediction accuracy. The overall results are, to our knowledge, the best reported to date, and we attribute this success to the combination of an accurate all-atom energy function, efficient methods for loop buildup and side-chain optimization, and, especially for the longer loops, the hierarchical refinement protocol.
Collapse
Affiliation(s)
- Matthew P Jacobson
- Department of Pharmaceutical Chemistry, University of California, San Francisco 94143-2240, USA.
| | | | | | | | | | | | | |
Collapse
|
83
|
Dasgupta B, Pal L, Basu G, Chakrabarti P. Expanded turn conformations: characterization and sequence-structure correspondence in alpha-turns with implications in helix folding. Proteins 2004; 55:305-15. [PMID: 15048823 DOI: 10.1002/prot.20064] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Like the beta-turns, which are characterized by a limiting distance between residues two positions apart (i, i+3), a distance criterion (involving residues at positions i and i+4) is used here to identify alpha-turns from a database of known protein structures. At least 15 classes of alpha-turns have been enumerated based on the location in the phi,psi space of the three central residues (i+1 to i+3)-one of the major being the class AAA, where the residues occupy the conventional helical backbone torsion angles. However, moving towards the C-terminal end of the turn, there is a shift in the phi,psi angles towards more negative phi, such that the electrostatic repulsion between two consecutive carbonyl oxygen atoms is reduced. Except for the last position (i+4), there is not much similarity in residue composition at different positions of hydrogen and non-hydrogen bonded AAA turns. The presence or absence of Pro at i+1 position of alpha- and beta-turns has a bearing on whether the turn is hydrogen-bonded or without a hydrogen bond. In the tertiary structure, alpha-turns are more likely to be found in beta-hairpin loops. The residue composition at the beginning of the hydrogen bonded AAA alpha-turn has similarity with type I beta-turn and N-terminal positions of helices, but the last position matches with the C-terminal capping position of helices, suggesting that the existence of a "helix cap signal" at i+4 position prevents alpha-turns from growing into helices. Our results also provide new insights into alpha-helix nucleation and folding.
Collapse
|
84
|
Camproux AC, Gautier R, Tufféry P. A hidden markov model derived structural alphabet for proteins. J Mol Biol 2004; 339:591-605. [PMID: 15147844 DOI: 10.1016/j.jmb.2004.04.005] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2003] [Revised: 03/30/2004] [Accepted: 04/05/2004] [Indexed: 10/26/2022]
Abstract
Understanding and predicting protein structures depends on the complexity and the accuracy of the models used to represent them. We have set up a hidden Markov model that discretizes protein backbone conformation as series of overlapping fragments (states) of four residues length. This approach learns simultaneously the geometry of the states and their connections. We obtain, using a statistical criterion, an optimal systematic decomposition of the conformational variability of the protein peptidic chain in 27 states with strong connection logic. This result is stable over different protein sets. Our model fits well the previous knowledge related to protein architecture organisation and seems able to grab some subtle details of protein organisation, such as helix sub-level organisation schemes. Taking into account the dependence between the states results in a description of local protein structure of low complexity. On an average, the model makes use of only 8.3 states among 27 to describe each position of a protein structure. Although we use short fragments, the learning process on entire protein conformations captures the logic of the assembly on a larger scale. Using such a model, the structure of proteins can be reconstructed with an average accuracy close to 1.1A root-mean-square deviation and for a complexity of only 3. Finally, we also observe that sequence specificity increases with the number of states of the structural alphabet. Such models can constitute a very relevant approach to the analysis of protein architecture in particular for protein structure prediction.
Collapse
Affiliation(s)
- A C Camproux
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E0436, Université Paris 7, case 7113, 2 place Jussieu, 75251 Paris, France.
| | | | | |
Collapse
|
85
|
Rohl CA, Strauss CEM, Chivian D, Baker D. Modeling structurally variable regions in homologous proteins with rosetta. Proteins 2004; 55:656-77. [PMID: 15103629 DOI: 10.1002/prot.10629] [Citation(s) in RCA: 242] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three- and nine-residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side-chains to iteratively optimize backbone torsion angles and side-chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 A are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3A root-mean-square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39-residue insertion into a TIM barrel in CASP 5 target T0186.
Collapse
Affiliation(s)
- Carol A Rohl
- Department of Biomolecular Engineering, University of California, Santa Cruz 95064, USA.
| | | | | | | |
Collapse
|
86
|
Fourrier L, Benros C, de Brevern AG. Use of a structural alphabet for analysis of short loops connecting repetitive structures. BMC Bioinformatics 2004; 5:58. [PMID: 15140270 PMCID: PMC450294 DOI: 10.1186/1471-2105-5-58] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2004] [Accepted: 05/12/2004] [Indexed: 12/02/2022] Open
Abstract
Background Because loops connect regular secondary structures, analysis of the former depends directly on the definition of the latter. The numerous assignment methods, however, can offer different definitions. In a previous study, we defined a structural alphabet composed of 16 average protein fragments, which we called Protein Blocks (PBs). They allow an accurate description of every region of 3D protein backbones and have been used in local structure prediction. In the present study, we use this structural alphabet to analyze and predict the loops connecting two repetitive structures. Results We first analyzed the secondary structure assignments. Use of five different assignment methods (DSSP, DEFINE, PCURVE, STRIDE and PSEA) showed the absence of consensus: 20% of the residues were assigned to different states. The discrepancies were particularly important at the extremities of the repetitive structures. We used PBs to describe and predict the short loops because they can help analyze and in part explain these discrepancies. An analysis of the PB distribution in these regions showed some specificities in the sequence-structure relationship. Of the amino acid over- or under-representations observed in the short loop databank, 20% did not appear in the entire databank. Finally, predicting 3D structure in terms of PBs with a Bayesian approach yielded an accuracy rate of 36.0% for all loops and 41.2% for the short loops. Specific learning in the short loops increased the latter by 1%. Conclusion This work highlights the difficulties of assigning repetitive structures and the advantages of using more precise descriptions, that is, PBs. We observed some new amino acid distributions in the short loops and used this information to enhance local prediction. Instead of describing entire loops, our approach predicts each position in the loops locally. It can thus be used to propose many different structures for the loops and to probe and sample their flexibility. It can be a useful tool in ab initio loop prediction.
Collapse
Affiliation(s)
- Laurent Fourrier
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM E0346, Université Denis DIDEROT-Paris 7, case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France.
| | | | | |
Collapse
|
87
|
Law RJ, Sansom MSP. Homology modelling and molecular dynamics simulations: comparative studies of human aquaporin-1. EUROPEAN BIOPHYSICS JOURNAL: EBJ 2004; 33:477-89. [PMID: 15071758 DOI: 10.1007/s00249-004-0398-z] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/18/2003] [Revised: 02/10/2004] [Accepted: 02/12/2004] [Indexed: 10/26/2022]
Abstract
The structures of the mammalian water transport protein Aqp1 and of its bacterial homologue GlpF enables us to test whether homology models can be used to explore relationships between structure, dynamics and function in mammalian transport proteins. Molecular dynamics simulations (totalling almost 40 ns) were performed starting from: the X-ray structure of Aqp1; a homology model of Aqp1 based on the GlpF structure; and intermediate resolution structures of Aqp1 derived from electron microscopy. Comparisons of protein RMSDs vs. time suggest that the homology models are of comparable conformational stability to the X-ray structure, whereas the intermediate resolution structures exhibit significant conformation drift. For simulations based on the X-ray structure and on homology models, the flexibility profile vs. residue number correlates well with the crystallographic B-values for each residue. In the simulations based on intermediate resolution structures, mobility of the highly conserved NPA loops is substantially higher than in the simulations based on the X-ray structure or the homology models. Pore radius profiles remained relatively constant in the X-ray and homology model simulations but showed substantial fluctuations (reflecting the higher NPA loop mobility) in the intermediate resolution simulations. The orientation of the dipoles of water molecules within the pore is of key importance in maintaining low proton permeability through Aqp1. This property seems to be quite robust to the starting model used in the simulation. These simulations suggest that homology models based on bacterial homologues may be used to derive functionally relevant information on the structural dynamics of mammalian transport proteins.
Collapse
Affiliation(s)
- Richard J Law
- Laboratory of Molecular Biophysics, Department of Biochemistry, The University of Oxford, South Parks Road, Oxford, OX1 3QU, UK
| | | |
Collapse
|
88
|
Tendulkar AV, Joshi AA, Sohoni MA, Wangikar PP. Clustering of Protein Structural Fragments Reveals Modular Building Block Approach of Nature. J Mol Biol 2004; 338:611-29. [PMID: 15081817 DOI: 10.1016/j.jmb.2004.02.047] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2003] [Revised: 02/11/2004] [Accepted: 02/17/2004] [Indexed: 11/29/2022]
Abstract
Structures of peptide fragments drawn from a protein can potentially occupy a vast conformational continuum. We co-ordinatize this conformational space with the help of geometric invariants and demonstrate that the peptide conformations of the currently available protein structures are heavily biased in favor of a finite number of conformational types or structural building blocks. This is achieved by representing a peptides' backbone structure with geometric invariants and then clustering peptides based on closeness of the geometric invariants. This results in 12,903 clusters, of which 2207 are made up of peptides drawn from functionally and/or structurally related proteins. These are termed "functional" clusters and provide clues about potential functional sites. The rest of the clusters, including the largest few, are made up of peptides drawn from unrelated proteins and are termed "structural" clusters. The largest clusters are of regular secondary structures such as helices and beta strands as well as of beta hairpins. Several categories of helices and strands are discovered based on geometric differences. In addition to the known classes of loops, we discover several new classes, which will be useful in protein structure modeling. Our algorithm does not require assignment of secondary structure and, therefore, overcomes the limitations in loop classification due to ambiguity in secondary structure assignment at loop boundaries.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Kanwal Rekhi School of Information Technology, Indian Institute of Technology, Bombay, Powai, Mumbai 400 076, India
| | | | | | | |
Collapse
|
89
|
Cortés J, Siméon T, Remaud-Siméon M, Tran V. Geometric algorithms for the conformational analysis of long protein loops. J Comput Chem 2004; 25:956-67. [PMID: 15027107 DOI: 10.1002/jcc.20021] [Citation(s) in RCA: 75] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The efficient filtering of unfeasible conformations would considerably benefit the exploration of the conformational space when searching for minimum energy structures or during molecular simulation. The most important conditions for filtering are the maintenance of molecular chain integrity and the avoidance of steric clashes. These conditions can be seen as geometric constraints on a molecular model. In this article, we discuss how techniques issued from recent research in robotics can be applied to this filtering. Two complementary techniques are presented: one for conformational sampling and another for computing conformational changes satisfying such geometric constraints. The main interest of the proposed techniques is their application to the structural analysis of long protein loops. First experimental results demonstrate the efficacy of the approach for studying the mobility of loop 7 in amylosucrase from Neisseria polysaccharea. The supposed motions of this 17-residue loop would play an important role in the activity of this enzyme.
Collapse
Affiliation(s)
- J Cortés
- LAAS-CNRS, 7 avenue du Colonel-Roche, 31077 Toulouse, France.
| | | | | | | |
Collapse
|
90
|
Zhang C, Liu S, Zhou Y. Accurate and efficient loop selections by the DFIRE-based all-atom statistical potential. Protein Sci 2004; 13:391-9. [PMID: 14739324 PMCID: PMC2286705 DOI: 10.1110/ps.03411904] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/03/2003] [Revised: 10/17/2003] [Accepted: 10/17/2003] [Indexed: 10/26/2022]
Abstract
The conformations of loops are determined by the water-mediated interactions between amino acid residues. Energy functions that describe the interactions can be derived either from physical principles (physical-based energy function) or statistical analysis of known protein structures (knowledge-based statistical potentials). It is commonly believed that statistical potentials are appropriate for coarse-grained representation of proteins but are not as accurate as physical-based potentials when atomic resolution is required. Several recent applications of physical-based energy functions to loop selections appear to support this view. In this article, we apply a recently developed DFIRE-based statistical potential to three different loop decoy sets (RAPPER, Jacobson, and Forrest-Woolf sets). Together with a rotamer library for side-chain optimization, the performance of DFIRE-based potential in the RAPPER decoy set (385 loop targets) is comparable to that of AMBER/GBSA for short loops (two to eight residues). The DFIRE is more accurate for longer loops (9 to 12 residues). Similar trend is observed when comparing DFIRE with another physical-based OPLS/SGB-NP energy function in the large Jacobson decoy set (788 loop targets). In the Forrest-Woolf decoy set for the loops of membrane proteins, the DFIRE potential performs substantially better than the combination of the CHARMM force field with several solvation models. The results suggest that a single-term DFIRE-statistical energy function can provide an accurate loop prediction at a fraction of computing cost required for more complicate physical-based energy functions. A Web server for academic users is established for loop selection at the softwares/services section of the Web site http://theory.med.buffalo.edu/.
Collapse
Affiliation(s)
- Chi Zhang
- Howard Hughes Medical Institute Center for Single Molecule Biophysics and Department of Physiology and Biophysics, State University of New York at Buffalo, 124 Sherman Hall, Buffalo, NY 14214, USA
| | | | | |
Collapse
|
91
|
Espadaler J, Fernandez-Fuentes N, Hermoso A, Querol E, Aviles FX, Sternberg MJE, Oliva B. ArchDB: automated protein loop classification as a tool for structural genomics. Nucleic Acids Res 2004; 32:D185-8. [PMID: 14681390 PMCID: PMC308737 DOI: 10.1093/nar/gkh002] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The annotation of protein function has become a crucial problem with the advent of sequence and structural genomics initiatives. A large body of evidence suggests that protein structural information is frequently encoded in local sequences, and that folds are mainly made up of a number of simple local units of super-secondary structural motifs, consisting of a few secondary structures and their connecting loops. Moreover, protein loops play an important role in protein function. Here we present ArchDB, a classification database of structural motifs, consisting of one loop plus its bracing secondary structures. ArchDB currently contains 12,665 super-secondary elements classified into 1496 motif subclasses. The database provides an easy way to retrieve functional information from protein structures sharing a common motif, to search motifs found in a given SCOP family, superfamily or fold, or to search by keywords on proteins with classified loops. The ArchDB database of loops is located at http://sbi.imim.es/archdb.
Collapse
Affiliation(s)
- Jordi Espadaler
- Institut de Biotecnologia i de Biomedicina and Departament de Bioquímica, Universitat Autònoma de Barcelona, 08193 Bellaterra, Spain
| | | | | | | | | | | | | |
Collapse
|
92
|
Comparative Protein Structure Modeling and its Applications to Drug Discovery. ANNUAL REPORTS IN MEDICINAL CHEMISTRY 2004. [DOI: 10.1016/s0065-7743(04)39020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
93
|
Marti‐Renom MA, Madhusudhan M, Eswar N, Pieper U, Shen M, Sali A, Fiser A, Mirkovic N, John B, Stuart A. Modeling Protein Structure from its Sequence. ACTA ACUST UNITED AC 2003. [DOI: 10.1002/0471250953.bi0501s03] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Affiliation(s)
- Marc A. Marti‐Renom
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - M.S. Madhusudhan
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Narayanan Eswar
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Ursula Pieper
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Min‐yi Shen
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Andrej Sali
- Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry and The California Institute for Quantitative Biomedical Research University of California at San Francisco San Francisco California
| | - Andras Fiser
- Department of Biochemistry and Seaver Foundation Center for Bioinformatics Albert Einstein College of Medicine Bronx New York
| | - Nebojsa Mirkovic
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| | - Bino John
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| | - Ashley Stuart
- Laboratory of Molecular Biophysics The Rockefeller University New York New York
| |
Collapse
|
94
|
Duarte CM, Wadley LM, Pyle AM. RNA structure comparison, motif search and discovery using a reduced representation of RNA conformational space. Nucleic Acids Res 2003; 31:4755-61. [PMID: 12907716 PMCID: PMC169959 DOI: 10.1093/nar/gkg682] [Citation(s) in RCA: 97] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Given the wealth of new RNA structures and the growing list of RNA functions in biology, it is of great interest to understand the repertoire of RNA folding motifs. The ability to identify new and known motifs within novel RNA structures, to compare tertiary structures with one another and to quantify the characteristics of a given RNA motif are major goals in the field of RNA research; however, there are few systematic ways to address these issues. Using a novel approach for visualizing and mathematically describing macromolecular structures, we have developed a means to quantitatively describe RNA molecules in order to rapidly analyze, compare and explore their features. This approach builds on the alternative eta,theta convention for describing RNA torsion angles and is executed using a new program called PRIMOS. Applying this methodology, we have successfully identified major regions of conformational change in the 50S and 30S ribosomal subunits, we have developed a means to search the database of RNA structures for the prevalence of known motifs and we have classified and identified new motifs. These applications illustrate the powerful capabilities of our new RNA structural convention, and they suggest future adaptations with important implications for bioinformatics and structural genomics.
Collapse
Affiliation(s)
- Carlos M Duarte
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
| | | | | |
Collapse
|
95
|
Haspel N, Tsai CJ, Wolfson H, Nussinov R. Reducing the computational complexity of protein folding via fragment folding and assembly. Protein Sci 2003; 12:1177-87. [PMID: 12761388 PMCID: PMC2323902 DOI: 10.1110/ps.0232903] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2002] [Revised: 12/23/2002] [Accepted: 02/23/2003] [Indexed: 10/27/2022]
Abstract
Understanding, and ultimately predicting, how a 1-D protein chain reaches its native 3-D fold has been one of the most challenging problems during the last few decades. Data increasingly indicate that protein folding is a hierarchical process. Hence, the question arises as to whether we can use the hierarchical concept to reduce the practically intractable computational times. For such a scheme to work, the first step is to cut the protein sequence into fragments that form local minima on the polypeptide chain. The conformations of such fragments in solution are likely to be similar to those when the fragments are embedded in the native fold, although alternate conformations may be favored during the mutual stabilization in the combinatorial assembly process. Two elements are needed for such cutting: (1) a library of (clustered) fragments derived from known protein structures and (2) an assignment algorithm that selects optimal combinations to "cover" the protein sequence. The next two steps in hierarchical folding schemes, not addressed here, are the combinatorial assembly of the fragments and finally, optimization of the obtained conformations. Here, we address the first step in a hierarchical protein-folding scheme. The input is a target protein sequence and a library of fragments created by clustering building blocks that were generated by cutting all protein structures. The output is a set of cutout fragments. We briefly outline a graph theoretic algorithm that automatically assigns building blocks to the target sequence, and we describe a sample of the results we have obtained.
Collapse
Affiliation(s)
- Nurit Haspel
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | | | |
Collapse
|
96
|
Abstract
An analysis of Omega loops in a nonredundant set of protein structures from the Protein Data Bank has been carried out to determine the nature of the "turn elements" present. Because Omega loops essentially reverse their direction in three-dimensional space, this analysis was made with respect to four turn elements identified as (1) Gly; (2) Pro; (3) a residue with alpha-helical phi,psi angles, termed a helical residue; and (4) a cis peptide. A set of 1079 Omega loops from a set of 680 proteins were used for the analysis. Apart from other criteria that define Omega loops, the selection of an Omega loop from a cluster of loops is based on an exposure index. In this study, analyses have been made with two sets of data: (1) Omega loops arising from a minimum exposure index indicative of a less exposed loop (xmin set) and (2) Omega loops with a maximum exposure index indicative of a relatively exposed loop (xmax set). Overall residue preferences and positional preferences have been examined. Positions of the turn elements for Omega loops of varying length have also been studied. Specific positional preferences are observed for particular turn elements with regard to the length of Omega loops. Analysis in terms of the turn elements can provide guidelines for modeling of loops in proteins. Apart from Pro, which has the natural tendency to form cis peptide bonds, a higher occurrence of non-Pro cis peptide bonds is observed. Torsion angles in Omega loops also indicate the occurrence of a large number of residues with helical phi,psi angles, necessary for the turn in the loop structures.
Collapse
Affiliation(s)
- Manoj Pal
- Department of Chemistry, Indian Institute of Technology, Kharagpur, India
| | | |
Collapse
|
97
|
Haspel N, Tsai CJ, Wolfson H, Nussinov R. Hierarchical protein folding pathways: a computational study of protein fragments. Proteins 2003; 51:203-15. [PMID: 12660989 DOI: 10.1002/prot.10294] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We have previously presented a building block folding model. The model postulates that protein folding is a hierarchical top-down process. The basic unit from which a fold is constructed, referred to as a hydrophobic folding unit, is the outcome of combinatorial assembly of a set of "building blocks." Results obtained by the computational cutting procedure yield fragments that are in agreement with those obtained experimentally by limited proteolysis. Here we show that as expected, proteins from the same family give very similar building blocks. However, different proteins can also give building blocks that are similar in structure. In such cases the building blocks differ in sequence, stability, contacts with other building blocks, and in their 3D locations in the protein structure. This result, which we have repeatedly observed in many cases, leads us to conclude that while a building block is influenced by its environment, nevertheless, it can be viewed as a stand-alone unit. For small-sized building blocks existing in multiple conformations, interactions with sister building blocks in the protein will increase the population time of the native conformer. With this conclusion in hand, it is possible to develop an algorithm that predicts the building block assignment of a protein sequence whose structure is unknown. Toward this goal, we have created sequentially nonredundant databases of building block sequences. A protein sequence can be aligned against these, in order to be matched to a set of potential building blocks.
Collapse
Affiliation(s)
- Nurit Haspel
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | | | | | | |
Collapse
|
98
|
Abstract
Most globular proteins are divisible by domains, distinct substructures of the globule. The notion of hierarchy of the domains was introduced earlier via van der Waals energy profiles that allow one to subdivide the proteins into domains (subdomains). The question remains open as to what is the possible structural connection of the energy profiles. The recent discovery of the loop-n-lock elements in the globular proteins suggests such a structural connection. A direct comparison of the segmentation by van der Waals energy criteria with the maps of the locked loops of nearly standard size reveals a striking correlation: domains in general appear to consist of one to several such loops. In addition, it was demonstrated that a variety of subdivisions of the same protein into domains is just a regrouping of the loop-n-lock elements.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Department of Structural Biology, The Weizmann Institute of Science, P.O.B. 26, Rehovot 76100, Israel.
| |
Collapse
|
99
|
Mosbah A, Campanacci V, Lartigue A, Tegoni M, Cambillau C, Darbon H. Solution structure of a chemosensory protein from the moth Mamestra brassicae. Biochem J 2003; 369:39-44. [PMID: 12217077 PMCID: PMC1223053 DOI: 10.1042/bj20021217] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2002] [Accepted: 09/09/2002] [Indexed: 11/17/2022]
Abstract
Chemosensory proteins (CSPs) are believed to be involved in chemical communication and perception. A number of such proteins, of molecular mass approximately 13 kDa, have been isolated from different sensory organs of a wide range of insect species. Several CSPs have been identified in the antennae and proboscis of the moth Mamestra brassicae. CSPMbraA6, a 112-amino-acid antennal protein, has been expressed in a soluble form in large quantities in the Escherichi coli periplasm. NMR structure determination of CSPMbraA6 has been performed with 1H- and 15N-labelled samples. The calculated structures present an average root mean square deviation about the mean structure of 0.63 A for backbone atoms and 1.27 A for all non-hydrogen atoms except the 12 N-terminal residues. The protein is well folded from residue 12 to residue 110, and consists of a non-bundle alpha-helical structure with six helices connected by alpha alpha loops. It has a globular shape, with overall dimensions of 32 A x 28 A x 24 A. A channel is visible in the hydrophobic core, with dimensions of 3 A x 9 A x 21 A. In some of the 20 solution structures calculated, this channel is closed either by Trp-94 at one end or by Tyr-26 at the other end; in some other solutions, this channel is closed at both ends. Binding experiments with 12-bromododecanol indicate that the CSPMbraA6 structure is modified upon ligand binding.
Collapse
Affiliation(s)
- Amor Mosbah
- AFMB, UMR 6098-CNRS and Universités Aix-Marseille I & II, 31 Chemin J. Aiguier, 13402 Marseille Cedex 20, France
| | | | | | | | | | | |
Collapse
|
100
|
Kolodny R, Koehl P, Guibas L, Levitt M. Small libraries of protein fragments model native protein structures accurately. J Mol Biol 2002; 323:297-307. [PMID: 12381322 DOI: 10.1016/s0022-2836(02)00942-7] [Citation(s) in RCA: 144] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Prediction of protein structure depends on the accuracy and complexity of the models used. Here, we represent the polypeptide chain by a sequence of rigid fragments that are concatenated without any degrees of freedom. Fragments chosen from a library of representative fragments are fit to the native structure using a greedy build-up method. This gives a one-dimensional representation of native protein three-dimensional structure whose quality depends on the nature of the library. We use a novel clustering method to construct libraries that differ in the fragment length (four to seven residues) and number of representative fragments they contain (25-300). Each library is characterized by the quality of fit (accuracy) and the number of allowed states per residue (complexity). We find that the accuracy depends on the complexity and varies from 2.9A for a 2.7-state model on the basis of fragments of length 7-0.76A for a 15-state model on the basis of fragments of length 5. Our goal is to find representations that are both accurate and economical (low complexity). The models defined here are substantially better in this regard: with ten states per residue we approximate native protein structure to 1A compared to over 20 states per residue needed previously. For the same complexity, we find that longer fragments provide better fits. Unfortunately, libraries of longer fragments must be much larger (for ten states per residue, a seven-residue library is 100 times larger than a five-residue library). As the number of known protein native structures increases, it will be possible to construct larger libraries to better exploit this correlation between neighboring residues. Our fragment libraries, which offer a wide range of optimal fragments suited to different accuracies of fit, may prove to be useful for generating better decoy sets for ab initio protein folding and for generating accurate loop conformations in homology modeling.
Collapse
Affiliation(s)
- Rachel Kolodny
- Department of Structural Biology, Stanford University Medical School, Fairchild Building, Stanford, CA 94305, USA.
| | | | | | | |
Collapse
|