1
|
Bota PM, Oliva B, Fernandez-Fuentes N. Theoretical 3D Modeling of NLRP3 Inflammasome Complex. Methods Mol Biol 2023; 2696:269-280. [PMID: 37578729 DOI: 10.1007/978-1-0716-3350-2_18] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/15/2023]
Abstract
The NOD-like receptor pyrin domain containing 3 (NLRP3) is a multidomain protein that plays a key role in innate immune response. Structures of NLRP3 in different conformational states and bound to cognate partners are available. In this chapter we present an approach to model the oligomeric structure of NLRP3 by homology modeling using multiple templates, symmetry, and refinement. The overall process presented here represents advanced exercise in structural modeling that provides unique insights into the biological role and activation of NLRP3 oligomer. Finally, the same approach can be easily adapted to the rest of the members of the NLRP family.
Collapse
Affiliation(s)
- Patricia Mirela Bota
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Lab (GRIB-IMIM), Department of Experimental and Health Science, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain.
| | - Narcis Fernandez-Fuentes
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, UK
| |
Collapse
|
2
|
Abstract
The knowledge of protein-protein interactions (PPIs) and PPI networks (PPINs) is the key to starting to understand the biological processes inside the cell. Many computational tools have been designed to help explore PPIs and PPINs, such as those for interaction detection, reliability assessment and interaction network construction. Here, the application of computational tools is reviewed from three perspectives: PPI database construction, PPI prediction, and interaction network construction and analysis. This overview will provide researchers guidance on choosing appropriate methods for exploring PPIs.
Collapse
Affiliation(s)
- Shaowei Dong
- Department of Cell and System Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada
| | - Nicholas J Provart
- Department of Cell and System Biology, Centre for the Analysis of Genome Evolution and Function, University of Toronto, Toronto, ON, Canada.
| |
Collapse
|
3
|
Li D, Hu X, Liu X, Feng Z, Ding C. Using feature optimization-based support vector machine method to recognize the β-hairpin motifs in enzymes. Saudi J Biol Sci 2016; 24:1361-1369. [PMID: 28855832 PMCID: PMC5562482 DOI: 10.1016/j.sjbs.2016.11.014] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2016] [Revised: 11/16/2016] [Accepted: 11/17/2016] [Indexed: 11/28/2022] Open
Abstract
β-Hairpins in enzyme, a kind of special protein with catalytic functions, contain many binding sites which are essential for the functions of enzyme. With the increasing number of observed enzyme protein sequences, it is of especial importance to use bioinformatics techniques to quickly and accurately identify the β-hairpin in enzyme protein for further advanced annotation of structure and function of enzyme. In this work, the proposed method was trained and tested on a non-redundant enzyme β-hairpin database containing 2818 β-hairpins and 1098 non-β-hairpins. With 5-fold cross-validation on the training dataset, the overall accuracy of 90.08% and Matthew’s correlation coefficient (Mcc) of 0.74 were obtained, while on the independent test dataset, the overall accuracy of 88.93% and Mcc of 0.76 were achieved. Furthermore, the method was validated on 845 β-hairpins with ligand binding sites. With 5-fold cross-validation on the training dataset and independent test on the test dataset, the overall accuracies were 85.82% (Mcc of 0.71) and 84.78% (Mcc of 0.70), respectively. With an integration of mRMR feature selection and SVM algorithm, a reasonable high accuracy was achieved, indicating the method to be an effective tool for the further studies of β-hairpins in enzymes structure. Additionally, as a novelty for function prediction of enzymes, β-hairpins with ligand binding sites were predicted. Based on this work, a web server was constructed to predict β-hairpin motifs in enzymes (http://202.207.29.251:8080/).
Collapse
Affiliation(s)
- Dongmei Li
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Xiuzhen Hu
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Xingxing Liu
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Zhenxing Feng
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| | - Changjiang Ding
- College of Sciences, Inner Mongolia University of Technology, Hohhot 010051, China
| |
Collapse
|
4
|
Han YC, Song JM, Wang L, Shu CC, Guo J, Chen LL. Prediction and characterization of protein-protein interaction network in Bacillus licheniformis WX-02. Sci Rep 2016; 6:19486. [PMID: 26782814 PMCID: PMC4726086 DOI: 10.1038/srep19486] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2015] [Accepted: 12/09/2015] [Indexed: 01/22/2023] Open
Abstract
In this study, we constructed a protein-protein interaction (PPI) network of B. licheniformis strain WX-02 with interolog method and domain-based method, which contained 15,864 edges and 2,448 nodes. Although computationally predicted networks have relatively low coverage and high false-positive rate, our prediction was confirmed from three perspectives: local structural features, functional similarities and transcriptional correlations. Further analysis of the COG heat map showed that protein interactions in B. licheniformis WX-02 mainly occurred in the same functional categories. By incorporating the transcriptome data, we found that the topological properties of the PPI network were robust under normal and high salt conditions. In addition, 267 different protein complexes were identified and 117 poorly characterized proteins were annotated with certain functions based on the PPI network. Furthermore, the sub-network showed that a hub protein CcpA jointed directly or indirectly many proteins related to γ-PGA synthesis and regulation, such as PgsB, GltA, GltB, ProB, ProJ, YcgM and two signal transduction systems ComP-ComA and DegS-DegU. Thus, CcpA might play an important role in the regulation of γ-PGA synthesis. This study therefore will facilitate the understanding of the complex cellular behaviors and mechanisms of γ-PGA synthesis in B. licheniformis WX-02.
Collapse
Affiliation(s)
- Yi-Chao Han
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Jia-Ming Song
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Long Wang
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Cheng-Cheng Shu
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Jing Guo
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, P.R. China
| | - Ling-Ling Chen
- College of Informatics, Agricultural Bioinformatics Key Laboratory of Hubei Province, Huazhong Agricultural University, Wuhan 430070, P.R. China
| |
Collapse
|
5
|
Bonet J, Planas-Iglesias J, Garcia-Garcia J, Marín-López MA, Fernandez-Fuentes N, Oliva B. ArchDB 2014: structural classification of loops in proteins. Nucleic Acids Res 2013; 42:D315-9. [PMID: 24265221 PMCID: PMC3964960 DOI: 10.1093/nar/gkt1189] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open
Abstract
The function of a protein is determined by its three-dimensional structure, which is formed by regular (i.e. β-strands and α-helices) and non-periodic structural units such as loops. Compared to regular structural elements, non-periodic, non-repetitive conformational units enclose a much higher degree of variability—raising difficulties in the identification of regularities, and yet represent an important part of the structure of a protein. Indeed, loops often play a pivotal role in the function of a protein and different aspects of protein folding and dynamics. Therefore, the structural classification of protein loops is an important subject with clear applications in homology modelling, protein structure prediction, protein design (e.g. enzyme design and catalytic loops) and function prediction. ArchDB, the database presented here (freely available at http://sbi.imim.es/archdb), represents such a resource and has been an important asset for the scientific community throughout the years. In this article, we present a completely reworked and updated version of ArchDB. The new version of ArchDB features a novel, fast and user-friendly web-based interface, and a novel graph-based, computationally efficient, clustering algorithm. The current version of ArchDB classifies 149,134 loops in 5739 classes and 9608 subclasses.
Collapse
Affiliation(s)
- Jaume Bonet
- Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, 08950, Spain and Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, SY23 3DA Aberystwyth, Ceredigion, UK
| | | | | | | | | | | |
Collapse
|
6
|
Planas-Iglesias J, Marin-Lopez MA, Bonet J, Garcia-Garcia J, Oliva B. iLoops: a protein–protein interaction prediction server based on structural features. Bioinformatics 2013; 29:2360-2. [DOI: 10.1093/bioinformatics/btt401] [Citation(s) in RCA: 36] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
7
|
Ajawatanawong P, Baldauf SL. Evolution of protein indels in plants, animals and fungi. BMC Evol Biol 2013; 13:140. [PMID: 23826714 PMCID: PMC3706215 DOI: 10.1186/1471-2148-13-140] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2013] [Accepted: 06/24/2013] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Insertions/deletions (indels) in protein sequences are useful as drug targets, protein structure predictors, species diagnostics and evolutionary markers. However there is limited understanding of indel evolutionary patterns. We sought to characterize indel patterns focusing first on the major groups of multicellular eukaryotes. RESULTS Comparisons of complete proteomes from a taxonically broad set of primarily Metazoa, Fungi and Viridiplantae yielded 299 substantial (>250aa) universal, single-copy (in-paralog only) proteins, from which 901 simple (present/absent) and 3,806 complex (multistate) indels were extracted. Simple indels are mostly small (1-7aa) with a most frequent size class of 1aa. However, even these simple looking indels show a surprisingly high level of hidden homoplasy (multiple independent origins). Among the apparently homoplasy-free simple indels, we identify 69 potential clade-defining indels (CDIs) that may warrant closer examination. CDIs show a very uneven taxonomic distribution among Viridiplante (13 CDIs), Fungi (40 CDIs), and Metazoa (0 CDIs). An examination of singleton indels shows an excess of insertions over deletions in nearly all examined taxa. This excess averages 2.31 overall, with a maximum observed value of 7.5 fold. CONCLUSIONS We find considerable potential for identifying taxon-marker indels using an automated pipeline. However, it appears that simple indels in universal proteins are too rare and homoplasy-rich to be used for pure indel-based phylogeny. The excess of insertions over deletions seen in nearly every genome and major group examined maybe useful in defining more realistic gap penalties for sequence alignment. This bias also suggests that insertions in highly conserved proteins experience less purifying selection than do deletions.
Collapse
Affiliation(s)
- Pravech Ajawatanawong
- Department of Systematic Biology, Evolutionary Biology Centre (EBC), Uppsala University, Uppsala 75236, Sweden.
| | | |
Collapse
|
8
|
Understanding Protein–Protein Interactions Using Local Structural Features. J Mol Biol 2013; 425:1210-24. [DOI: 10.1016/j.jmb.2013.01.014] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 01/08/2013] [Accepted: 01/14/2013] [Indexed: 11/21/2022]
|
9
|
Abstract
Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop's end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between the anchors of a loop does not increase with an increase in the number of loop residues. Loop span is also unaffected by the secondary structures at the end points, unless the two anchors are part of an anti-parallel beta sheet. As loop span appears to be independent of global properties of the protein we suggest that its distribution can be described by a random fluctuation model based on the Maxwell-Boltzmann distribution. It is believed that the primary difficulty in protein loop structure prediction comes from the number of residues in the loop. Following the idea that loop span is an independent local property, we investigate its effect on protein loop structure prediction and show how normalised span (loop stretch) is related to the structural complexity of loops. Highly contracted loops are more difficult to predict than stretched loops.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Computer Science , Dartmouth College , Hanover, NH , USA
| | | | | |
Collapse
|
10
|
Ho HK, Zhang L, Ramamohanarao K, Martin S. A survey of machine learning methods for secondary and supersecondary protein structure prediction. Methods Mol Biol 2013; 932:87-106. [PMID: 22987348 DOI: 10.1007/978-1-62703-065-6_6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
In this chapter we provide a survey of protein secondary and supersecondary structure prediction using methods from machine learning. Our focus is on machine learning methods applicable to β-hairpin and β-sheet prediction, but we also discuss methods for more general supersecondary structure prediction. We provide background on the secondary and supersecondary structures that we discuss, the features used to describe them, and the basic theory behind the machine learning methods used. We survey the machine learning methods available for secondary and supersecondary structure prediction and compare them where possible.
Collapse
Affiliation(s)
- Hui Kian Ho
- Department of Computer Science and Software Engineering, University of Melbourne, National ICT Australia, Parkville, VIC, Australia
| | | | | | | |
Collapse
|
11
|
Structural modelling and dynamics of proteins for insights into drug interactions. Adv Drug Deliv Rev 2012; 64:323-43. [PMID: 22155026 DOI: 10.1016/j.addr.2011.11.011] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2011] [Revised: 11/17/2011] [Accepted: 11/24/2011] [Indexed: 12/27/2022]
Abstract
Proteins are the workhorses of biomolecules and their function is affected by their structure and their structural rearrangements during ligand entry, ligand binding and protein-protein interactions. Hence, the knowledge of protein structure and, importantly, the dynamic behaviour of the structure are critical for understanding how the protein performs its function. The predictions of the structure and the dynamic behaviour can be performed by combinations of structure modelling and molecular dynamics simulations. The simulations also need to be sensitive to the constraints of the environment in which the protein resides. Standard computational methods now exist in this field to support the experimental effort of solving protein structures. This review presents a comprehensive overview of the basis of the calculations and the well-established computational methods used to generate and understand protein structure and function and the study of their dynamic behaviour with the reference to lung-related targets.
Collapse
|
12
|
Skliros A, Zimmermann MT, Chakraborty D, Saraswathi S, Katebi AR, Leelananda SP, Kloczkowski A, Jernigan RL. The importance of slow motions for protein functional loops. Phys Biol 2012; 9:014001. [PMID: 22314977 DOI: 10.1088/1478-3975/9/1/014001] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Loops in proteins that connect secondary structures such as alpha-helix and beta-sheet, are often on the surface and may play a critical role in some functions of a protein. The mobility of loops is central for the motional freedom and flexibility requirements of active-site loops and may play a critical role for some functions. The structures and behaviors of loops have not been studied much in the context of the whole structure and its overall motions, especially how these might be coupled. Here we investigate loop motions by using coarse-grained structures (C(α) atoms only) to solve the motions of the system by applying Lagrange equations with elastic network models to learn about which loops move in an independent fashion and which move in coordination with domain motions, faster and slower, respectively. The normal modes of the system are calculated using eigen-decomposition of the stiffness matrix. The contribution of individual modes and groups of modes is investigated for their effects on all residues in each loop by using Fourier analyses. Our results indicate overall that the motions of functional sets of loops behave in similar ways as the whole structure. But overall only a relatively few loops move in coordination with the dominant slow modes of motion, and these are often closely related to function.
Collapse
Affiliation(s)
- Aris Skliros
- L. H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University, Ames, IA 50011, USA. Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | | | | | | | | | | | | | | |
Collapse
|
13
|
Joo H, Chavan AG, Day R, Lennox KP, Sukhanov P, Dahl DB, Vannucci M, Tsai J. Near-native protein loop sampling using nonparametric density estimation accommodating sparcity. PLoS Comput Biol 2011; 7:e1002234. [PMID: 22028638 PMCID: PMC3197639 DOI: 10.1371/journal.pcbi.1002234] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2011] [Accepted: 09/01/2011] [Indexed: 11/29/2022] Open
Abstract
Unlike the core structural elements of a protein like regular secondary structure, template based modeling (TBM) has difficulty with loop regions due to their variability in sequence and structure as well as the sparse sampling from a limited number of homologous templates. We present a novel, knowledge-based method for loop sampling that leverages homologous torsion angle information to estimate a continuous joint backbone dihedral angle density at each loop position. The φ,ψ distributions are estimated via a Dirichlet process mixture of hidden Markov models (DPM-HMM). Models are quickly generated based on samples from these distributions and were enriched using an end-to-end distance filter. The performance of the DPM-HMM method was evaluated against a diverse test set in a leave-one-out approach. Candidates as low as 0.45 Å RMSD and with a worst case of 3.66 Å were produced. For the canonical loops like the immunoglobulin complementarity-determining regions (mean RMSD <2.0 Å), the DPM-HMM method performs as well or better than the best templates, demonstrating that our automated method recaptures these canonical loops without inclusion of any IgG specific terms or manual intervention. In cases with poor or few good templates (mean RMSD >7.0 Å), this sampling method produces a population of loop structures to around 3.66 Å for loops up to 17 residues. In a direct test of sampling to the Loopy algorithm, our method demonstrates the ability to sample nearer native structures for both the canonical CDRH1 and non-canonical CDRH3 loops. Lastly, in the realistic test conditions of the CASP9 experiment, successful application of DPM-HMM for 90 loops from 45 TBM targets shows the general applicability of our sampling method in loop modeling problem. These results demonstrate that our DPM-HMM produces an advantage by consistently sampling near native loop structure. The software used in this analysis is available for download at http://www.stat.tamu.edu/~dahl/software/cortorgles/. A protein's structure consists of elements of regular secondary structure connected by less regular stretches of loop segments. The irregularity of the loop structure makes loop modeling quite challenging. More accurate sampling of these loop conformations has a direct impact on protein modeling, design, function classification, as well as protein interactions. A method has been developed that extends a more comprehensive knowledge-based approach to producing models of the loop regions of protein structure. Most physical models cannot adequately sample the large conformational space, while the more discrete knowledge based libraries are conformationally limited. To address both of these problems, we introduce a novel statistical method that produces a continuous yet weighted estimation of loop conformational space from a discrete library of structures by using a Dirichlet process mixture of hidden Markov models (DPM-HMM). Applied to loop structure sampling, the results of a number of tests demonstrate that our approach quickly generates large numbers of candidates with near native loop conformations. Most significantly, in the cases where the template sampling is sparse and/or far from native conformations, the DPM-HMM method samples close to the native space and produces a population of accurate loop structures.
Collapse
Affiliation(s)
- Hyun Joo
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Archana G. Chavan
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Ryan Day
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - Kristin P. Lennox
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Paul Sukhanov
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
| | - David B. Dahl
- Department of Statistics, Texas A&M University, College Station, Texas, United States of America
| | - Marina Vannucci
- Department of Statistics, Rice University, Houston, Texas, United States of America
| | - Jerry Tsai
- Department of Chemistry, University of the Pacific, Stockton, California, United States of America
- * E-mail:
| |
Collapse
|
14
|
Zou D, He Z, He J, Xia Y. Supersecondary structure prediction using Chou's pseudo amino acid composition. J Comput Chem 2010; 32:271-8. [DOI: 10.1002/jcc.21616] [Citation(s) in RCA: 94] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
|
15
|
Vanhee P, Verschueren E, Baeten L, Stricher F, Serrano L, Rousseau F, Schymkowitz J. BriX: a database of protein building blocks for structural analysis, modeling and design. Nucleic Acids Res 2010; 39:D435-42. [PMID: 20972210 PMCID: PMC3013806 DOI: 10.1093/nar/gkq972] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
High-resolution structures of proteins remain the most valuable source for understanding their function in the cell and provide leads for drug design. Since the availability of sufficient protein structures to tackle complex problems such as modeling backbone moves or docking remains a problem, alternative approaches using small, recurrent protein fragments have been employed. Here we present two databases that provide a vast resource for implementing such fragment-based strategies. The BriX database contains fragments from over 7000 non-homologous proteins from the Astral collection, segmented in lengths from 4 to 14 residues and clustered according to structural similarity, summing up to a content of 2 million fragments per length. To overcome the lack of loops classified in BriX, we constructed the Loop BriX database of non-regular structure elements, clustered according to end-to-end distance between the regular residues flanking the loop. Both databases are available online (http://brix.crg.es) and can be accessed through a user-friendly web-interface. For high-throughput queries a web-based API is provided, as well as full database downloads. In addition, two exciting applications are provided as online services: (i) user-submitted structures can be covered on the fly with BriX classes, representing putative structural variation throughout the protein and (ii) gaps or low-confidence regions in these structures can be bridged with matching fragments.
Collapse
Affiliation(s)
- Peter Vanhee
- VIB SWITCH Laboratory, Flanders Institute of Biotechnology, Free University of Brussels, Pleinlaan 2, 1050 Brussels, Belgium
| | | | | | | | | | | | | |
Collapse
|
16
|
Skliros A, Jernigan RL, Kloczkowski A. Models to Approximate the Motions of Protein Loops. J Chem Theory Comput 2010; 6:3249-3258. [PMID: 21031141 DOI: 10.1021/ct1001413] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
We approximate the loop motions of various proteins by using a coarse-grained model and the theory of rubberlike elasticity of polymer chains. The loops are considered as chains where only the first and the last residues thereof are tethered by their connections to the main structure; while within the loop, the loop residues are connected only to their sequence neighbors. We applied these approximate models to five proteins. Our approximation shows that the loop motions can usually be computed locally which shows these motions are robust and not random. But most interestingly, the new method presented here can be used to compute the likely motions of loops that are missing in the structures.
Collapse
Affiliation(s)
- Aris Skliros
- L. H. Baker Center for Bioinformatics and Biological Statistics, Department of Biochemistry, Biophysics and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | | | | |
Collapse
|
17
|
Choi Y, Deane CM. FREAD revisited: Accurate loop structure prediction using a database search algorithm. Proteins 2010; 78:1431-40. [PMID: 20034110 DOI: 10.1002/prot.22658] [Citation(s) in RCA: 121] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Loops are the most variable regions of protein structure and are, in general, the least accurately predicted. Their prediction has been approached in two ways, ab initio and database search. In recent years, it has been thought that ab initio methods are more powerful. In light of the continued rapid expansion in the number of known protein structures, we have re-evaluated FREAD, a database search method and demonstrate that the power of database search methods may have been underestimated. We found that sequence similarity as quantified by environment specific substitution scores can be used to significantly improve prediction. In fact, FREAD performs appreciably better for an identifiable subset of loops (two thirds of shorter loops and half of the longer loops tested) than the ab initio methods of MODELLER, PLOP, and RAPPER. Within this subset, FREAD's predictive ability is length independent, in general, producing results within 2A RMSD, compared to an average of over 10A for loop length 20 for any of the other tested methods. We also benchmarked the prediction protocols on a set of 212 loops from the model structures in CASP 7 and 8. An extended version of FREAD is able to make predictions for 127 of these, it gives the best prediction of the methods tested in 61 of these cases. In examining FREAD's ability to predict in the model environment, we found that whole structure quality did not affect the quality of loop predictions.
Collapse
Affiliation(s)
- Yoonjoo Choi
- Department of Statistics, Oxford University, United Kingdom.
| | | |
Collapse
|
18
|
Tendulkar AV, Krallinger M, de la Torre V, López G, Wangikar PP, Valencia A. FragKB: structural and literature annotation resource of conserved peptide fragments and residues. PLoS One 2010; 5:e9679. [PMID: 20305778 PMCID: PMC2841175 DOI: 10.1371/journal.pone.0009679] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2009] [Accepted: 02/12/2010] [Indexed: 01/21/2023] Open
Abstract
Background FragKB (Fragment Knowledgebase) is a repository of clusters of structurally similar fragments from proteins. Fragments are annotated with information at the level of sequence, structure and function, integrating biological descriptions derived from multiple existing resources and text mining. Methodology FragKB contains approximately 400,000 conserved fragments from 4,800 representative proteins from PDB. Literature annotations are extracted from more than 1,700 articles and are available for over 12,000 fragments. The underlying systematic annotation workflow of FragKB ensures efficient update and maintenance of this database. The information in FragKB can be accessed through a web interface that facilitates sequence and structural visualization of fragments together with known literature information on the consequences of specific residue mutations and functional annotations of proteins and fragment clusters. FragKB is accessible online at http://ubio.bioinfo.cnio.es/biotools/fragkb/. Significance The information presented in FragKB can be used for modeling protein structures, for designing novel proteins and for functional characterization of related fragments. The current release is focused on functional characterization of proteins through inspection of conservation of the fragments.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Structural Biology and Biocomputing Programme, Spanish National Cancer Center, Madrid, Spain.
| | | | | | | | | | | |
Collapse
|
19
|
Jamroz M, Kolinski A. Modeling of loops in proteins: a multi-method approach. BMC STRUCTURAL BIOLOGY 2010; 10:5. [PMID: 20149252 PMCID: PMC2837870 DOI: 10.1186/1472-6807-10-5] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/08/2009] [Accepted: 02/11/2010] [Indexed: 11/23/2022]
Abstract
Background Template-target sequence alignment and loop modeling are key components of protein comparative modeling. Short loops can be predicted with high accuracy using structural fragments from other, not necessairly homologous proteins, or by various minimization methods. For longer loops multiscale approaches employing coarse-grained de novo modeling techniques should be more effective. Results For a representative set of protein structures of various structural classes test predictions of loop regions have been performed using MODELLER, ROSETTA, and a CABS coarse-grained de novo modeling tool. Loops of various length, from 4 to 25 residues, were modeled assuming an ideal target-template alignment of the remaining portions of the protein. It has been shown that classical modeling with MODELLER is usually better for short loops, while coarse-grained de novo modeling is more effective for longer loops. Even very long missing fragments in protein structures could be effectively modeled. Resolution of such models is usually on the level 2-6 Å, which could be sufficient for guiding protein engineering. Further improvement of modeling accuracy could be achieved by the combination of different methods. In particular, we used 10 top ranked models from sets of 500 models generated by MODELLER as multiple templates for CABS modeling. On average, the resulting molecular models were better than the models from individual methods. Conclusions Accuracy of protein modeling, as demonstrated for the problem of loop modeling, could be improved by the combinations of different modeling techniques.
Collapse
Affiliation(s)
- Michal Jamroz
- Laboratory of Theory of Biopolymers, Faculty of Chemistry, University of Warsaw, Warsaw, Poland
| | | |
Collapse
|
20
|
Mining protein loops using a structural alphabet and statistical exceptionality. BMC Bioinformatics 2010; 11:75. [PMID: 20132552 PMCID: PMC2833150 DOI: 10.1186/1471-2105-11-75] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2009] [Accepted: 02/04/2010] [Indexed: 12/21/2022] Open
Abstract
Background Protein loops encompass 50% of protein residues in available three-dimensional structures. These regions are often involved in protein functions, e.g. binding site, catalytic pocket... However, the description of protein loops with conventional tools is an uneasy task. Regular secondary structures, helices and strands, have been widely studied whereas loops, because they are highly variable in terms of sequence and structure, are difficult to analyze. Due to data sparsity, long loops have rarely been systematically studied. Results We developed a simple and accurate method that allows the description and analysis of the structures of short and long loops using structural motifs without restriction on loop length. This method is based on the structural alphabet HMM-SA. HMM-SA allows the simplification of a three-dimensional protein structure into a one-dimensional string of states, where each state is a four-residue prototype fragment, called structural letter. The difficult task of the structural grouping of huge data sets is thus easily accomplished by handling structural letter strings as in conventional protein sequence analysis. We systematically extracted all seven-residue fragments in a bank of 93000 protein loops and grouped them according to the structural-letter sequence, named structural word. This approach permits a systematic analysis of loops of all sizes since we consider the structural motifs of seven residues rather than complete loops. We focused the analysis on highly recurrent words of loops (observed more than 30 times). Our study reveals that 73% of loop-lengths are covered by only 3310 highly recurrent structural words out of 28274 observed words). These structural words have low structural variability (mean RMSd of 0.85 Å). As expected, half of these motifs display a flanking-region preference but interestingly, two thirds are shared by short (less than 12 residues) and long loops. Moreover, half of recurrent motifs exhibit a significant level of amino-acid conservation with at least four significant positions and 87% of long loops contain at least one such word. We complement our analysis with the detection of statistically over-represented patterns of structural letters as in conventional DNA sequence analysis. About 30% (930) of structural words are over-represented, and cover about 40% of loop lengths. Interestingly, these words exhibit lower structural variability and higher sequential specificity, suggesting structural or functional constraints. Conclusions We developed a method to systematically decompose and study protein loops using recurrent structural motifs. This method is based on the structural alphabet HMM-SA and not on structural alignment and geometrical parameters. We extracted meaningful structural motifs that are found in both short and long loops. To our knowledge, it is the first time that pattern mining helps to increase the signal-to-noise ratio in protein loops. This finding helps to better describe protein loops and might permit to decrease the complexity of long-loop analysis. Detailed results are available at http://www.mti.univ-paris-diderot.fr/publication/supplementary/2009/ACCLoop/.
Collapse
|
21
|
Abstract
Functional characterization of a protein is often facilitated by its 3D structure. However, the fraction of experimentally known 3D models is currently less than 1% due to the inherently time-consuming and complicated nature of structure determination techniques. Computational approaches are employed to bridge the gap between the number of known sequences and that of 3D models. Template-based protein structure modeling techniques rely on the study of principles that dictate the 3D structure of natural proteins from the theory of evolution viewpoint. Strategies for template-based structure modeling will be discussed with a focus on comparative modeling, by reviewing techniques available for all the major steps involved in the comparative modeling pipeline.
Collapse
Affiliation(s)
- Andras Fiser
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, USA
| |
Collapse
|
22
|
Hermoso A, Espadaler J, Enrique Querol E, Aviles FX, Sternberg MJ, Oliva B, Fernandez-Fuentes N. Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB). Bioinform Biol Insights 2009; 1:77-90. [PMID: 20066127 PMCID: PMC2789696] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Loops represent an important part of protein structures. The study of loop is critical for two main reasons: First, loops are often involved in protein function, stability and folding. Second, despite improvements in experimental and computational structure prediction methods, modeling the conformation of loops remains problematic. Here, we present a structural classification of loops, ArchDB, a mine of information with application in both mentioned fields: loop structure prediction and function prediction. ArchDB (http://sbi.imim.es/archdb) is a database of classified protein loop motifs. The current database provides four different classification sets tailored for different purposes. ArchDB-40, a loop classification derived from SCOP40, well suited for modeling common loop motifs. Since features relevant to loop structure or function can be more easily determined on well-populated clusters, we have developed ArchDB-95, a loop classification derived from SCOP95. This new classification set shows a ~40% increase in the number of subclasses, and a large 7-fold increase in the number of putative structure/function-related subclasses. We also present ArchDB-EC, a classification of loop motifs from enzymes, and ArchDB-KI, a manually annotated classification of loop motifs from kinases. Information about ligand contacts and PDB sites has been included in all classification sets. Improvements in our classification scheme are described, as well as several new database features, such as the ability to query by conserved annotations, sequence similarity, or uploading 3D coordinates of a protein. The lengths of classified loops range between 0 and 36 residues long. ArchDB offers an exhaustive sampling of loop structures. Functional information about loops and links with related biological databases are also provided. All this information and the possibility to browse/query the database through a web-server outline an useful tool with application in the comparative study of loops, the analysis of loops involved in protein function and to obtain templates for loop modeling.
Collapse
Affiliation(s)
- Antoni Hermoso
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Jordi Espadaler
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain,Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - E Enrique Querol
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Francesc X. Aviles
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Michael J.E. Sternberg
- Structural Bioinformatics Group, Department of Biological Sciences, Imperial College, London SW7 2AZ, U.K
| | - Baldomero Oliva
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, St. James University Hospital, Leeds LS7 9TF. U.K,Correspondence: Narcis Fernandez-Fuentes, Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, St. James University Hospital, Bleckett St., Leeds LS7 9TF, U.K. Tel: +44(0)113 343 8614; Fax: +44 (0)113 343 8601;
| |
Collapse
|
23
|
Zou D, He Z, He J. Beta-hairpin prediction with quadratic discriminant analysis using diversity measure. J Comput Chem 2009; 30:2277-84. [PMID: 19263434 DOI: 10.1002/jcc.21229] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
On the basis of the features of protein sequential pattern, we used the method of increment of diversity combined with quadratic discriminant analysis (IDQD) to predict beta-hairpins motifs in protein sequences. Three rules are used to extract the raw beta-beta motifs sequential patterns for fixed-length. Amino acid basic compositions, dipeptide components, and amino acid composition distribution are combined to represent the compositional features. Eighteen feature variables on a sequential pattern to be predicted are defined in terms of ID. They are integrated in a single formal framework given by IDQD. The method is trained and tested on ArchDB40 dataset containing 3088 proteins. The overall accuracy of prediction and Matthew's correlation coefficient for the independent testing dataset are 81.7% and 0.60, respectively. In addition, a higher accuracy of 84.5% and Matthew's correlation coefficient of 0.68 for the independent testing dataset are obtained on a dataset previously used by Kumar et al. (Nucleic Acids Res 2005, 33, 154), which contains 2088 proteins. For a fair assessment of our method, the performance is also evaluated on all 63 proteins used in CASP6. The overall accuracy of prediction is 74.2% for the independent testing dataset.
Collapse
Affiliation(s)
- Dongsheng Zou
- College of Computer Science, Chongqing University, Chongqing 400044, China.
| | | | | |
Collapse
|
24
|
Kumar MVS, Swaminathan R. A novel approach to segregate and identify functional loop regions in protein structures using their Ramachandran maps. Proteins 2009; 78:900-16. [DOI: 10.1002/prot.22615] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
25
|
Babor M, Kortemme T. Multi-constraint computational design suggests that native sequences of germline antibody H3 loops are nearly optimal for conformational flexibility. Proteins 2009; 75:846-58. [PMID: 19194863 DOI: 10.1002/prot.22293] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The limited size of the germline antibody repertoire has to recognize a far larger number of potential antigens. The ability of a single antibody to bind multiple ligands due to conformational flexibility in the antigen-binding site can significantly enlarge the repertoire. Among the six complementarity determining regions (CDRs) that generally comprise the binding site, the CDR H3 loop is particularly variable. Computational protein design studies showed that predicted low energy sequences compatible with a given backbone structure often have considerable similarity to the corresponding native sequences of naturally occurring proteins, indicating that native protein sequences are close to optimal for their structures. Here, we take a step forward to determine whether conformational flexibility, believed to play a key functional role in germline antibodies, is also central in shaping their native sequence. In particular, we use a multi-constraint computational design strategy, along with the Rosetta scoring function, to propose that the native sequences of CDR H3 loops from germline antibodies are nearly optimal for conformational flexibility. Moreover, we find that antibody maturation may lead to sequences with a higher degree of optimization for a single conformation, while disfavoring sequences that are intrinsically flexible. In addition, this computational strategy allows us to predict mutations in the CDR H3 loop to stabilize the antigen-bound conformation, a computational mimic of affinity maturation, that may increase antigen binding affinity by preorganizing the antigen binding loop. In vivo affinity maturation data are consistent with our predictions. The method described here can be useful to design antibodies with higher selectivity and affinity by reducing conformational diversity.
Collapse
Affiliation(s)
- Mariana Babor
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California 94158-2330, USA
| | | |
Collapse
|
26
|
Umezawa K, Ikebe J, Nomizu M, Nakamura H, Higo J. Conformational requirement on peptides to exert laminin's activities and search for protein segments with laminin's activities. Biopolymers 2009; 92:124-31. [PMID: 19180521 DOI: 10.1002/bip.21148] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
The human laminin alpha3 chain LG4 module has biological activities of cell adhesion, heparin binding, migration, and neurite outgrowth. The authors had previously identified that the active site of this protein is in residues 1411-1429 (amino-acid sequence = KNSFMALYLSKGRLVFALG called A3G756) and that a three-amino-acid sequence KGR in A3G756 is crucial for exerting the activities. An experiment has shown that a cyclo-hEF3A peptide (a cyclic analog of A3G756) exhibits stronger activities than a linear-hEF3A peptide (a linearized peptide of the cyclo-hEF3A peptide). This experiment implies that adopting a loop conformation may be important for exerting the activities. In this study, the authors first computed the solution structures of the cyclo-hEF3A and linear-hEF3A peptides by molecular dynamics simulations. The obtained conformational ensembles consisted of a variety of conformations, which is a usual property of short peptides in solution. The ensembles involved a fraction where the peptide adopted beta-hairpins and KGR was located at the hairpin head. If there are protein segments that adopt beta-hairpins similar to those sampled from the simulation and have the KGR sequence at the hairpin head, these segments may have some activities. Then, the authors searched a database for segments satisfying these requirements and detected six functional segments. Three of them had laminin's activity, and the remaining three had activities similar to laminin's activities. Analyses on the conformational ensembles of cyclo- and linear-hEF3A peptides suggest that not only the KGR position in the hairpin but also the inter-strand packing is important for exerting laminin's activities.
Collapse
Affiliation(s)
- Koji Umezawa
- Graduate School of Frontier Biosciences, Osaka University, Open Laboratories for Advanced Bioscience and Biotechnology, Suita, Osaka, Japan
| | | | | | | | | |
Collapse
|
27
|
Recognition of β-hairpin motifs in proteins by using the composite vector. Amino Acids 2009; 38:915-21. [DOI: 10.1007/s00726-009-0299-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2008] [Accepted: 04/20/2009] [Indexed: 10/20/2022]
|
28
|
Sawada Y, Honda S. ProSeg: a database of local structures of protein segments. J Comput Aided Mol Des 2008; 23:163-9. [DOI: 10.1007/s10822-008-9248-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2008] [Accepted: 09/26/2008] [Indexed: 11/29/2022]
|
29
|
Hsing M, Cherkasov A. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins. BMC Bioinformatics 2008; 9:293. [PMID: 18578882 PMCID: PMC2459192 DOI: 10.1186/1471-2105-9-293] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2007] [Accepted: 06/25/2008] [Indexed: 11/26/2022] Open
Abstract
Background Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. Description We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. Conclusion By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
Collapse
Affiliation(s)
- Michael Hsing
- Bioinformatics Graduate Program, Faculty of Graduate Studies, University of British Columbia, 100-570 West 7th Avenue, Vancouver, BC V5T 4S6, Canada.
| | | |
Collapse
|
30
|
Lin MS, Head-Gordon T. Improved Energy Selection of Nativelike Protein Loops from Loop Decoys. J Chem Theory Comput 2008; 4:515-21. [DOI: 10.1021/ct700292u] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Matthew S. Lin
- UCSF/UCB Joint Graduate Group in Bioengineering, Berkeley, California 94720, and Department of Bioengineering, University of California, Berkeley, California 94720
| | - Teresa Head-Gordon
- UCSF/UCB Joint Graduate Group in Bioengineering, Berkeley, California 94720, and Department of Bioengineering, University of California, Berkeley, California 94720
| |
Collapse
|
31
|
Hermoso A, Espadaler J, Enrique Querol E, Aviles FX, Sternberg MJ, Oliva B, Fernandez-Fuentes N. Including Functional Annotations and Extending the Collection of Structural Classifications of Protein Loops (ArchDB). Bioinform Biol Insights 2008. [DOI: 10.1177/117793220700100004] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
Loops represent an important part of protein structures. The study of loop is critical for two main reasons: First, loops are often involved in protein function, stability and folding. Second, despite improvements in experimental and computational structure prediction methods, modeling the conformation of loops remains problematic. Here, we present a structural classification of loops, ArchDB, a mine of information with application in both mentioned fields: loop structure prediction and function prediction. ArchDB ( http://sbi.imim.es/archdb ) is a database of classified protein loop motifs. The current database provides four different classification sets tailored for different purposes. ArchDB-40, a loop classification derived from SCOP40, well suited for modeling common loop motifs. Since features relevant to loop structure or function can be more easily determined on well-populated clusters, we have developed ArchDB-95, a loop classification derived from SCOP95. This new classification set shows a ~40% increase in the number of subclasses, and a large 7-fold increase in the number of putative structure/function-related subclasses. We also present ArchDB-EC, a classification of loop motifs from enzymes, and ArchDB-KI, a manually annotated classification of loop motifs from kinases. Information about ligand contacts and PDB sites has been included in all classification sets. Improvements in our classification scheme are described, as well as several new database features, such as the ability to query by conserved annotations, sequence similarity, or uploading 3D coordinates of a protein. The lengths of classified loops range between 0 and 36 residues long. ArchDB offers an exhaustive sampling of loop structures. Functional information about loops and links with related biological databases are also provided. All this information and the possibility to browse/query the database through a web-server outline an useful tool with application in the comparative study of loops, the analysis of loops involved in protein function and to obtain templates for loop modeling.
Collapse
Affiliation(s)
- Antoni Hermoso
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Jordi Espadaler
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - E Enrique Querol
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Francesc X. Aviles
- Laboratori de Bioinformàtica, Institut de Biomedicina I Biotecnologia, Universitat Autònoma de Barcelona, Bellaterra 08193, Catalonia. Spain
| | - Michael J.E. Sternberg
- Structural Bioinformatics Group, Department of Biological Sciences, Imperial College, London SW7 2AZ, U.K
| | - Baldomero Oliva
- Laboratori de Bioinformàtica Estructural (GRIB), Universitat Pompeu Fabra/IMIM, Parc de Recerca Biomèdica de Barcelona, Barcelona 08003, Catalonia, Spain
| | - Narcis Fernandez-Fuentes
- Leeds Institute of Molecular Medicine, Section of Experimental Therapeutics, St. James University Hospital, Leeds LS7 9TF. U.K
| |
Collapse
|
32
|
|
33
|
De Brevern AG, Etchebest C, Benros C, Hazout S. "Pinning strategy": a novel approach for predicting the backbone structure in terms of protein blocks from sequence. J Biosci 2007; 32:51-70. [PMID: 17426380 DOI: 10.1007/s12038-007-0006-3] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The description of protein 3D structures can be performed through a library of 3D fragments, named a structural alphabet. Our structural alphabet is composed of 16 small protein fragments of 5 C alpha in length, called protein blocks (PBs). It allows an efficient approximation of the 3D protein structures and a correct prediction of the local structure. The 72 most frequent series of 5 consecutive PBs, called structural words (SWs)are able to cover more than 90% of the 3D structures. PBs are highly conditioned by the presence of a limited number of transitions between them. In this study, we propose a new method called "pinning strategy" that used this specific feature to predict long protein fragments. Its goal is to define highly probable successions of PBs. It starts from the most probable SW and is then extended with overlapping SWs. Starting from an initial prediction rate of 34.4%, the use of the SWs instead of the PBs allows a gain of 4.5%. The pinning strategy simply applied to the SWs increases the prediction accuracy to 39.9%. In a second step, the sequence-structure relationship is optimized, the prediction accuracy reaches 43.6%.
Collapse
Affiliation(s)
- A G De Brevern
- 1 INSERM, U726, Equipe de Bioinformatique Genomique et Moleculaire (EBGM), Universite Paris 7,case 7113, 2, place Jussieu, 75251 Paris Cedex 05, France.
| | | | | | | |
Collapse
|
34
|
Peng HP, Yang AS. Modeling protein loops with knowledge-based prediction of sequence-structure alignment. Bioinformatics 2007; 23:2836-42. [PMID: 17827204 DOI: 10.1093/bioinformatics/btm456] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION As protein structure database expands, protein loop modeling remains an important and yet challenging problem. Knowledge-based protein loop prediction methods have met with two challenges in methodology development: (1) loop boundaries in protein structures are frequently problematic in constructing length-dependent loop databases for protein loop predictions; (2) knowledge-based modeling of loops of unknown structure requires both aligning a query loop sequence to loop templates and ranking the loop sequence-template matches. RESULTS We developed a knowledge-based loop prediction method that circumvents the need of constructing hierarchically clustered length-dependent loop libraries. The method first predicts local structural fragments of a query loop sequence and then structurally aligns the predicted structural fragments to a set of non-redundant loop structural templates regardless of the loop length. The sequence-template alignments are then quantitatively evaluated with an artificial neural network model trained on a set of predictions with known outcomes. Prediction accuracy benchmarks indicated that the novel procedure provided an alternative approach overcoming the challenges of knowledge-based loop prediction. AVAILABILITY http://cmb.genomics.sinica.edu.tw
Collapse
Affiliation(s)
- Hung-Pin Peng
- Genomics Research Center, Academia Sinica. 128 Academia Road, Section 2, Nankang District, Taipei 115, Taiwan, ROC
| | | |
Collapse
|
35
|
Dezi C, Brea J, Alvarado M, Raviña E, Masaguer CF, Loza MI, Sanz F, Pastor M. Multistructure 3D-QSAR studies on a series of conformationally constrained butyrophenones docked into a new homology model of the 5-HT2A receptor. J Med Chem 2007; 50:3242-55. [PMID: 17579386 DOI: 10.1021/jm070277a] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The present study is part of a long-term research project aiming to gain insight into the mechanism of action of atypical antipsychotics. Here we describe a 3D-QSAR study carried out on a series of butyrophenones with affinity for the serotonin-2A receptor, aligned by docking into the binding site of a receptor model. The series studied has two peculiarities: (i) all the compounds have a chiral center and can be represented by two enantiomeric structures, and (ii) many of the structures can bind the receptor in two alternative orientations, posing the problem of how to select a single representative structure for every compound. We have used an original solution consisting of the simultaneous use of multiple structures, representing different configurations, binding conformations, and positions. The final model showed good statistical quality (n = 426, r2 = 0.84, q2LOO = 0.81) and its interpretation provided useful information, not obtainable from the simple inspection of the ligand-receptor complexes.
Collapse
Affiliation(s)
- Cristina Dezi
- Research Unit on Biomedical Informatics (GRIB), IMIM, Universitat Pompeu Fabra, Dr. Aiguader 88, E-08003 Barcelona, Spain
| | | | | | | | | | | | | | | |
Collapse
|
36
|
Yoon S, Ebert JC, Chung EY, De Micheli G, Altman RB. Clustering protein environments for function prediction: finding PROSITE motifs in 3D. BMC Bioinformatics 2007; 8 Suppl 4:S10. [PMID: 17570144 PMCID: PMC1892080 DOI: 10.1186/1471-2105-8-s4-s10] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND Structural genomics initiatives are producing increasing numbers of three-dimensional (3D) structures for which there is little functional information. Structure-based annotation of molecular function is therefore becoming critical. We previously presented FEATURE, a method for describing microenvironments around functional sites in proteins. However, FEATURE uses supervised machine learning and so is limited to building models for sites of known importance and location. We hypothesized that there are a large number of sites in proteins that are associated with function that have not yet been recognized. Toward that end, we have developed a method for clustering protein microenvironments in order to evaluate the potential for discovering novel sites that have not been previously identified. RESULTS We have prototyped a computational method for rapid clustering of millions of microenvironments in order to discover residues whose surrounding environments are similar and which may therefore share a functional or structural role. We clustered nearly 2,000,000 environments from 9,600 protein chains and defined 4,550 clusters. As a preliminary validation, we asked whether known 3D environments associated with PROSITE motifs were "rediscovered". We found examples of clusters highly enriched for residues that share PROSITE sequence motifs. CONCLUSION Our results demonstrate that we can cluster protein environments successfully using a simplified representation and K-means clustering algorithm. The rediscovery of known 3D motifs allows us to calibrate the size and intercluster distances that characterize useful clusters. This information will then allow us to find new clusters with similar characteristics that represent novel structural or functional sites.
Collapse
Affiliation(s)
- Sungroh Yoon
- Computer Systems Laboratory, Stanford University, Stanford, CA 94305, USA
- Intel Corporation, 2200 Mission College Blvd., Santa Clara, CA 95054, USA
| | - Jessica C Ebert
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| | - Eui-Young Chung
- School of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749, Republic of Korea
| | - Giovanni De Micheli
- Integrated Systems Center, Swiss Federal Institute of Technology (EPFL), Lausanne, CH-1015, Switzerland
| | - Russ B Altman
- Department of Genetics, Stanford University, Stanford, CA 94305, USA
| |
Collapse
|
37
|
Espadaler J, Querol E, Aviles FX, Oliva B. Identification of function-associated loop motifs and application to protein function prediction. Bioinformatics 2006; 22:2237-43. [PMID: 16870939 DOI: 10.1093/bioinformatics/btl382] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION The detection of function-related local 3D-motifs in protein structures can provide insights towards protein function in absence of sequence or fold similarity. Protein loops are known to play important roles in protein function and several loop classifications have been described, but the automated identification of putative functional 3D-motifs in such classifications has not yet been addressed. This identification can be used on sequence annotations. RESULTS We evaluated three different scoring methods for their ability to identify known motifs from the PROSITE database in ArchDB. More than 500 new putative function-related motifs not reported in PROSITE were identified. Sequence patterns derived from these motifs were especially useful at predicting precise annotations. The number of reliable sequence annotations could be increased up to 100% with respect to standard BLAST. CONTACT boliva@imim.es SUPPLEMENTARY INFORMATION Supplementary Data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jordi Espadaler
- Group de Bioinformàtica Estructural (GRIB-IMIM), Departament de Ciències Experimentals i de la Salut, Universitat Pompeu Fabra 08003 Barcelona, Catalonia, Spain
| | | | | | | |
Collapse
|
38
|
Brea J, Castro M, Loza MI, Masaguer CF, Raviña E, Dezi C, Pastor M, Sanz F, Cabrero-Castel A, Galán-Rodríguez B, Fernández-Espejo E, Maldonado R, Robledo P. QF2004B, a potential antipsychotic butyrophenone derivative with similar pharmacological properties to clozapine. Neuropharmacology 2006; 51:251-62. [PMID: 16697427 DOI: 10.1016/j.neuropharm.2006.03.021] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2005] [Revised: 03/14/2006] [Accepted: 03/15/2006] [Indexed: 11/26/2022]
Abstract
The aim of the present work was to characterize a lead compound displaying relevant multi-target interactions, and with an in vivo behavioral profile predictive of atypical antipsychotic activity. Synthesis, molecular modeling and in vitro and in vivo pharmacological studies were carried out for 2-[4-(6-fluorobenzisoxazol-3-yl)piperidinyl]methyl-1,2,3,4-tetrahydro-carbazol-4-one (QF2004B), a conformationally constrained butyrophenone analogue. This compound showed a multi-receptor profile with affinities similar to those of clozapine for serotonin (5-HT2A, 5-HT1A, and 5-HT2C), dopamine (D1, D2, D3 and D4), alpha-adrenergic (alpha1, alpha2), muscarinic (M1, M2) and histamine H1 receptors. In addition, QF2004B mirrored the antipsychotic activity and atypical profile of clozapine in a broad battery of in vivo tests including locomotor activity (ED50 = 1.19 mg/kg), apomorphine-induced stereotypies (ED50 = 0.75 mg/kg), catalepsy (ED50 = 2.13 mg/kg), apomorphine- and DOI (2,5-dimethoxy-4-iodoamphetamine)-induced prepulse inhibition (PPI) tests. These results point to QF2004B as a new lead compound with a relevant multi-receptor interaction profile for the discovery and development of new antipsychotics.
Collapse
Affiliation(s)
- José Brea
- Departamento de Farmacología, Universidad de Santiago de Compostela, Santiago de Compostela, Spain
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
39
|
Fernandez-Fuentes N, Oliva B, Fiser A. A supersecondary structure library and search algorithm for modeling loops in protein structures. Nucleic Acids Res 2006; 34:2085-97. [PMID: 16617149 PMCID: PMC1440879 DOI: 10.1093/nar/gkl156] [Citation(s) in RCA: 61] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We present a fragment-search based method for predicting loop conformations in protein models. A hierarchical and multidimensional database has been set up that currently classifies 105 950 loop fragments and loop flanking secondary structures. Besides the length of the loops and types of bracing secondary structures the database is organized along four internal coordinates, a distance and three types of angles characterizing the geometry of stem regions. Candidate fragments are selected from this library by matching the length, the types of bracing secondary structures of the query and satisfying the geometrical restraints of the stems and subsequently inserted in the query protein framework where their fit is assessed by the root mean square deviation (r.m.s.d.) of stem regions and by the number of rigid body clashes with the environment. In the final step remaining candidate loops are ranked by a Z-score that combines information on sequence similarity and fit of predicted and observed ϕ/ψ main chain dihedral angle propensities. Confidence Z-score cut-offs were determined for each loop length that identify those predicted fragments that outperform a competitive ab initio method. A web server implements the method, regularly updates the fragment library and performs prediction. Predicted segments are returned, or optionally, these can be completed with side chain reconstruction and subsequently annealed in the environment of the query protein by conjugate gradient minimization. The prediction method was tested on artificially prepared search datasets where all trivial sequence similarities on the SCOP superfamily level were removed. Under these conditions it is possible to predict loops of length 4, 8 and 12 with coverage of 98, 78 and 28% with at least of 0.22, 1.38 and 2.47 Å of r.m.s.d. accuracy, respectively. In a head-to-head comparison on loops extracted from freshly deposited new protein folds the current method outperformed in a ∼5:1 ratio an earlier developed database search method.
Collapse
Affiliation(s)
| | - Baldomero Oliva
- Structural Bioinformatics Group (GRIB), Universitat Pompeu FabraC/Doctor Aiguader,80. 08003, Barcelona, Catalonia, Spain
| | - András Fiser
- To whom correspondence should be addressed. Tel: +1 718 430 3233; Fax: +1 718 430 856;
| |
Collapse
|
40
|
Benros C, de Brevern AG, Etchebest C, Hazout S. Assessing a novel approach for predicting local 3D protein structures from sequence. Proteins 2006; 62:865-80. [PMID: 16385557 DOI: 10.1002/prot.20815] [Citation(s) in RCA: 33] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We developed a novel approach for predicting local protein structure from sequence. It relies on the Hybrid Protein Model (HPM), an unsupervised clustering method we previously developed. This model learns three-dimensional protein fragments encoded into a structural alphabet of 16 protein blocks (PBs). Here, we focused on 11-residue fragments encoded as a series of seven PBs and used HPM to cluster them according to their local similarities. We thus built a library of 120 overlapping prototypes (mean fragments from each cluster), with good three-dimensional local approximation, i.e., a mean accuracy of 1.61 A Calpha root-mean-square distance. Our prediction method is intended to optimize the exploitation of the sequence-structure relations deduced from this library of long protein fragments. This was achieved by setting up a system of 120 experts, each defined by logistic regression to optimize the discrimination from sequence of a given prototype relative to the others. For a target sequence window, the experts computed probabilities of sequence-structure compatibility for the prototypes and ranked them, proposing the top scorers as structural candidates. Predictions were defined as successful when a prototype <2.5 A from the true local structure was found among those proposed. Our strategy yielded a prediction rate of 51.2% for an average of 4.2 candidates per sequence window. We also proposed a confidence index to estimate prediction quality. Our approach predicts from sequence alone and will thus provide valuable information for proteins without structural homologs. Candidates will also contribute to global structure prediction by fragment assembly.
Collapse
Affiliation(s)
- Cristina Benros
- Equipe de Bioinformatique Génomique et Moléculaire, INSERM U726, Université Denis DIDEROT-Paris 7, Paris, France.
| | | | | | | |
Collapse
|
41
|
Fernandez-Fuentes N, Querol E, Aviles FX, Sternberg MJE, Oliva B. Prediction of the conformation and geometry of loops in globular proteins: testing ArchDB, a structural classification of loops. Proteins 2006; 60:746-57. [PMID: 16021623 DOI: 10.1002/prot.20516] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
In protein structure prediction, a central problem is defining the structure of a loop connecting 2 secondary structures. This problem frequently occurs in homology modeling, fold recognition, and in several strategies in ab initio structure prediction. In our previous work, we developed a classification database of structural motifs, ArchDB. The database contains 12,665 clustered loops in 451 structural classes with information about phi-psi angles in the loops and 1492 structural subclasses with the relative locations of the bracing secondary structures. Here we evaluate the extent to which sequence information in the loop database can be used to predict loop structure. Two sequence profiles were used, a HMM profile and a PSSM derived from PSI-BLAST. A jack-knife test was made removing homologous loops using SCOP superfamily definition and predicting afterwards against recalculated profiles that only take into account the sequence information. Two scenarios were considered: (1) prediction of structural class with application in comparative modeling and (2) prediction of structural subclass with application in fold recognition and ab initio. For the first scenario, structural class prediction was made directly over loops with X-ray secondary structure assignment, and if we consider the top 20 classes out of 451 possible classes, the best accuracy of prediction is 78.5%. In the second scenario, structural subclass prediction was made over loops using PSI-PRED (Jones, J Mol Biol 1999;292:195-202) secondary structure prediction to define loop boundaries, and if we take into account the top 20 subclasses out of 1492, the best accuracy is 46.7%. Accuracy of loop prediction was also evaluated by means of RMSD calculations.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Institute of Biomedicine and Biotechnology, Universitat Autonoma de Barcelona, Bellaterra, Barcelona, Spain
| | | | | | | | | |
Collapse
|
42
|
Szarecka A, Meirovitch H. Optimization of the GB/SA solvation model for predicting the structure of surface loops in proteins. J Phys Chem B 2006; 110:2869-80. [PMID: 16471897 PMCID: PMC1945207 DOI: 10.1021/jp055771+] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Implicit solvation models are commonly optimized with respect to experimental data or Poisson-Boltzmann (PB) results obtained for small molecules, where the force field is sometimes not considered. In previous studies, we have developed an optimization procedure for cyclic peptides and surface loops in proteins based on the entire system studied and the specific force field used. Thus, the loop has been modeled by the simplified solvation function E(tot) = E(FF) (epsilon = 2r) + Sigma(i) sigma(i)A(i), where E(FF) (epsilon = nr) is the AMBER force field energy with a distance-dependent dielectric function, epsilon = nr, A(i) is the solvent accessible surface area of atom i, and sigma(i) is its atomic solvation parameter. During the optimization process, the loop is free to move while the protein template is held fixed in its X-ray structure. To improve on the results of this model, in the present work we apply our optimization procedure to the physically more rigorous solvation model, the generalized Born with surface area (GB/SA) (together with the all-atom AMBER force field) as suggested by Still and co-workers (J. Phys. Chem. A 1997, 101, 3005). The six parameters of the GB/SA model, namely, P(1)-P(5) and the surface area parameter, sigma (programmed in the TINKER package) are reoptimized for a "training" group of nine loops, and a best-fit set is defined from the individual sets of optimized parameters. The best-fit set and Still's original set of parameters (where Lys, Arg, His, Glu, and Asp are charged or neutralized) were applied to the training group as well as to a "test" group of seven loops, and the energy gaps and the corresponding RMSD values were calculated. These GB/SA results based on the three sets of parameters have been found to be comparable; surprisingly, however, they are somewhat inferior (e.g, of larger energy gaps) to those obtained previously from the simplified model described above. We discuss recent results for loops obtained by other solvation models and potential directions for future studies.
Collapse
Affiliation(s)
- Agnieszka Szarecka
- Department of Computational Biology, University of Pittsburgh School of Medicine, Suite 3064, BST 3, 3501 Fifth Avenue, Pittsburgh, PA 15213
| | - Hagai Meirovitch
- Department of Computational Biology, University of Pittsburgh School of Medicine, Suite 3064, BST 3, 3501 Fifth Avenue, Pittsburgh, PA 15213
| |
Collapse
|
43
|
Tendulkar AV, Sohoni MA, Ogunnaike B, Wangikar PP. A geometric invariant-based framework for the analysis of protein conformational space. Bioinformatics 2005; 21:3622-8. [PMID: 16096349 DOI: 10.1093/bioinformatics/bti621] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Characterization of the restricted nature of the protein local conformational space has remained a challenge, thereby necessitating a computationally expensive conformational search in protein modeling. Moreover, owing to the lack of unilateral structural descriptors, conventional data mining techniques, such as clustering and classification, have not been applied in protein structure analysis. RESULTS We first map the local conformations in a fixed dimensional space by using a carefully selected suite of geometric invariants (GIs) and then reduce the number of dimensions via principal component analysis (PCA). Distribution of the conformations in the space spanned by the first four PCs is visualized as a set of conditional bivariate probability distribution plots, where the peaks correspond to the preferred conformations. The locations of the different canonical structures in the PC-space have been interpreted in the context of the weights of the GIs to the first four PCs. Clustering of the available conformations reveals that the number of preferred local conformations is several orders of magnitude smaller than that suggested previously. SUPPLEMENTARY INFORMATION www.it.iitb.ac.in/~ashish/bioinfo2005/.
Collapse
Affiliation(s)
- Ashish V Tendulkar
- Kanwal Rekhi School of Information Technology, Indian Institute of Technology Bombay, Powai, Mumbai 400076, India
| | | | | | | |
Collapse
|
44
|
Centeno NB, Planas-Iglesias J, Oliva B. Comparative modelling of protein structure and its impact on microbial cell factories. Microb Cell Fact 2005; 4:20. [PMID: 15989691 PMCID: PMC1183243 DOI: 10.1186/1475-2859-4-20] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2005] [Accepted: 06/30/2005] [Indexed: 11/22/2022] Open
Abstract
Comparative modeling is becoming an increasingly helpful technique in microbial cell factories as the knowledge of the three-dimensional structure of a protein would be an invaluable aid to solve problems on protein production. For this reason, an introduction to comparative modeling is presented, with special emphasis on the basic concepts, opportunities and challenges of protein structure prediction. This review is intended to serve as a guide for the biologist who has no special expertise and who is not involved in the determination of protein structure. Selected applications of comparative modeling in microbial cell factories are outlined, and the role of microbial cell factories in the structural genomics initiative is discussed.
Collapse
Affiliation(s)
- Nuria B Centeno
- Structural Bioinformatics Laboratory, Research Group on Biomedical Informatics (GRIB), IMIM/UPF. c/ Dr. Aiguader 80. 08003 Barcelona, Spain
| | - Joan Planas-Iglesias
- Structural Bioinformatics Laboratory, Research Group on Biomedical Informatics (GRIB), IMIM/UPF. c/ Dr. Aiguader 80. 08003 Barcelona, Spain
| | - Baldomero Oliva
- Structural Bioinformatics Laboratory, Research Group on Biomedical Informatics (GRIB), IMIM/UPF. c/ Dr. Aiguader 80. 08003 Barcelona, Spain
| |
Collapse
|
45
|
Lee MC, Deng J, Briggs JM, Duan Y. Large-scale conformational dynamics of the HIV-1 integrase core domain and its catalytic loop mutants. Biophys J 2005; 88:3133-46. [PMID: 15731379 PMCID: PMC1305464 DOI: 10.1529/biophysj.104.058446] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
HIV-1 integrase is one of the three essential enzymes required for viral replication and has great potential as a novel target for anti-HIV drugs. Although tremendous efforts have been devoted to understanding this protein, the conformation of the catalytic core domain around the active site, particularly the catalytic loop overhanging the active site, is still not well characterized by experimental methods due to its high degree of flexibility. Recent studies have suggested that this conformational dynamics is directly correlated with enzymatic activity, but the details of this dynamics is not known. In this study, we conducted a series of extended-time molecular dynamics simulations and locally enhanced sampling simulations of the wild-type and three loop hinge mutants to investigate the conformational dynamics of the core domain. A combined total of >480 ns of simulation data was collected which allowed us to study the conformational changes that were not possible to observe in the previously reported short-time molecular dynamics simulations. Among the main findings are a major conformational change (>20 A) in the catalytic loop, which revealed a gatinglike dynamics, and a transient intraloop structure, which provided a rationale for the mutational effects of several residues on the loop including Q(148), P(145), and Y(143). Further, clustering analyses have identified seven major conformational states of the wild-type catalytic loop. Their implications for catalytic function and ligand interaction are discussed. The findings reported here provide a detailed view of the active site conformational dynamics and should be useful for structure-based inhibitor design for integrase.
Collapse
Affiliation(s)
- Matthew C Lee
- Department of Chemistry and Biochemistry, University of Delaware, Newark, Delaware 19716, USA
| | | | | | | |
Collapse
|
46
|
Fernandez-Fuentes N, Hermoso A, Espadaler J, Querol E, Aviles FX, Oliva B. Classification of common functional loops of kinase super-families. Proteins 2004; 56:539-55. [PMID: 15229886 DOI: 10.1002/prot.20136] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
A structural classification of loops has been obtained from a set of 141 protein structures classified as kinases. A total of 1813 loops was classified into 133 subclasses (9 betabeta(links), 15 betabeta(hairpins), 31 alpha-alpha, 46 alpha-beta and 32 beta-alpha). Functional information and specific features relating subclasses and function were included in the classification. Functional loops such as the P-loop (shared by different folds) or the Gly-rich-loop, among others, were classified into structural motifs. As a result, a common mechanism of catalysis and substrate binding was proved for most kinases. Additionally, the multiple-alignment of loop sequences made within each subclass was shown to be useful for comparative modeling of kinase loops. The classification is summarized in a kinase loop database located at http://sbi.imim.es/archki.
Collapse
Affiliation(s)
- Narcis Fernandez-Fuentes
- Institut de Biotecnologia i Biomedicina and Department de Bioquímica i Biologia Molecular, Universitat Autònoma de Barcelona, Bellaterra 08193, Spain
| | | | | | | | | | | |
Collapse
|