1
|
Vetrivel I, de Brevern AG, Cadet F, Srinivasan N, Offmann B. Structural variations within proteins can be as large as variations observed across their homologues. Biochimie 2019; 167:162-170. [PMID: 31560932 DOI: 10.1016/j.biochi.2019.09.013] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2019] [Accepted: 09/18/2019] [Indexed: 10/26/2022]
Abstract
Understanding the structural plasticity of proteins is key to understanding the intricacies of their functions and mechanistic basis. In the current study, we analyzed the available multiple crystal structures of the same protein for the structural differences. For this purpose we used an abstraction of protein structures referred as Protein Blocks (PBs) that was previously established. We also characterized the nature of the structural variations for a few proteins using molecular dynamics simulations. In both the cases, the structural variations were summarized in the form of substitution matrices of PBs. We show that certain conformational states are preferably replaced by other specific conformational states. Interestingly, these structural variations are highly similar to those previously observed across structures of homologous proteins (r2 = 0.923) or across the ensemble of conformations from NMR data (r2 = 0.919). Thus our study quantitatively shows that overall trends of structural changes in a given protein are nearly identical to the trends of structural differences that occur in the topologically equivalent positions in homologous proteins. Specific case studies are used to illustrate the nature of these structural variations.
Collapse
Affiliation(s)
- Iyanar Vetrivel
- Université de Nantes, UFIP UMR 6286 CNRS, UFR Sciences et Techniques, 2 Chemin de La Houssinière, Nantes, France
| | - Alexandre G de Brevern
- INSERM UMR_S 1134, DSIMB Team, Laboratory of Excellence, GR-Ex, Univ Paris Diderot, Univ Sorbonne Paris Cité, INTS, 6 Rue Alexandre Cabanel, Paris, France
| | - Frédéric Cadet
- University of Paris, UMR_S1134, BIGR, Inserm, F-75015, Paris, France; DSIMB, UMR_S1134, BIGR, Inserm, Laboratory of Excellence GR-Ex, Faculty of Sciences and Technology, University of La Reunion, F-97715, Saint-Denis, France; PEACCEL, Protein Engineering Accelerator, 6 Square Albin Cachot, Box 42, 75013, Paris, France
| | | | - Bernard Offmann
- Université de Nantes, UFIP UMR 6286 CNRS, UFR Sciences et Techniques, 2 Chemin de La Houssinière, Nantes, France.
| |
Collapse
|
2
|
Prisilla A, Chellapandi P. Cloning and expression of immunogenic Clostridium botulinum C2I mutant proteins designed from their evolutionary imprints. Comp Immunol Microbiol Infect Dis 2019; 65:207-212. [DOI: 10.1016/j.cimid.2019.01.012] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2018] [Revised: 12/15/2018] [Accepted: 01/14/2019] [Indexed: 01/11/2023]
|
3
|
Gao S, Lu Y, Li Y, Huang R, Zheng G. Enhancement in the catalytic activity of Sulfolobus solfataricus P2 (+)-γ-lactamase by semi-rational design with the aid of a newly established high-throughput screening method. Appl Microbiol Biotechnol 2018; 103:251-263. [DOI: 10.1007/s00253-018-9428-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2018] [Revised: 08/26/2018] [Accepted: 09/19/2018] [Indexed: 10/28/2022]
|
4
|
Khan FI, Wei DQ, Gu KR, Hassan MI, Tabrez S. Current updates on computer aided protein modeling and designing. Int J Biol Macromol 2016; 85:48-62. [DOI: 10.1016/j.ijbiomac.2015.12.072] [Citation(s) in RCA: 72] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Revised: 12/17/2015] [Accepted: 12/21/2015] [Indexed: 12/15/2022]
|
5
|
De Laet M, Gilis D, Rooman M. Stability strengths and weaknesses in protein structures detected by statistical potentials: Application to bovine seminal ribonuclease. Proteins 2015; 84:143-58. [DOI: 10.1002/prot.24962] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2015] [Revised: 10/27/2015] [Accepted: 11/09/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Marie De Laet
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| | - Dimitri Gilis
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| | - Marianne Rooman
- 3BIO-BioInfo Department; Université Libre De Bruxelles; Avenue F. Roosevelt 50 CP 165/61 Brussels 1050 Belgium
| |
Collapse
|
6
|
Stability-activity tradeoffs constrain the adaptive evolution of RubisCO. Proc Natl Acad Sci U S A 2014; 111:2223-8. [PMID: 24469821 DOI: 10.1073/pnas.1310811111] [Citation(s) in RCA: 100] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
A well-known case of evolutionary adaptation is that of ribulose-1,5-bisphosphate carboxylase (RubisCO), the enzyme responsible for fixation of CO2 during photosynthesis. Although the majority of plants use the ancestral C3 photosynthetic pathway, many flowering plants have evolved a derived pathway named C4 photosynthesis. The latter concentrates CO2, and C4 RubisCOs consequently have lower specificity for, and faster turnover of, CO2. The C4 forms result from convergent evolution in multiple clades, with substitutions at a small number of sites under positive selection. To understand the physical constraints on these evolutionary changes, we reconstructed in silico ancestral sequences and 3D structures of RubisCO from a large group of related C3 and C4 species. We were able to precisely track their past evolutionary trajectories, identify mutations on each branch of the phylogeny, and evaluate their stability effect. We show that RubisCO evolution has been constrained by stability-activity tradeoffs similar in character to those previously identified in laboratory-based experiments. The C4 properties require a subset of several ancestral destabilizing mutations, which from their location in the structure are inferred to mainly be involved in enhancing conformational flexibility of the open-closed transition in the catalytic cycle. These mutations are near, but not in, the active site or at intersubunit interfaces. The C3 to C4 transition is preceded by a sustained period in which stability of the enzyme is increased, creating the capacity to accept the functionally necessary destabilizing mutations, and is immediately followed by compensatory mutations that restore global stability.
Collapse
|
7
|
Wright JD, Sargsyan K, Wu X, Brooks BR, Lim C. Protein-Protein Docking Using EMAP in CHARMM and Support Vector Machine: Application to Ab/Ag Complexes. J Chem Theory Comput 2013; 9:4186-94. [PMID: 26592408 DOI: 10.1021/ct400508s] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
In this work, we have (i) evaluated the ability of the EMAP method implemented in the CHARMM program to generate the correct conformation of Ab/Ag complex structures and (ii) developed a support vector machine (SVM) classifier to detect native conformations among the thousands of refined Ab/Ag configurations using the individual components of the binding free energy based on a thermodynamic cycle as input features in training the SVM. Tests on 24 Ab/Ag complexes from the protein-protein docking benchmark version 3.0 showed that based on CAPRI evaluation criteria, EMAP could generate medium-quality native conformations in each case. Furthermore, the SVM classifier could rank medium/high-quality native conformations mostly in the top six among the thousands of refined Ab/Ag configurations. Thus, Ab-Ag docking can be performed using different levels of protein representations, from grid-based (EMAP) to polar hydrogen (united-atom) to all-atom representation within the same program. The scripts used and the trained SVM are available at the www.charmm.org forum script repository.
Collapse
Affiliation(s)
- Jon D Wright
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan.,Genomics Research Institute, Academia Sinica , Taipei 115, Taiwan
| | - Karen Sargsyan
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan
| | - Xiongwu Wu
- Laboratory of Computational Biology, NHLBI, National Institutes of Health , Bethesda, Maryland, United States
| | - Bernard R Brooks
- Laboratory of Computational Biology, NHLBI, National Institutes of Health , Bethesda, Maryland, United States
| | - Carmay Lim
- Institute of Biomedical Sciences, Academia Sinica , Taipei 115, Taiwan.,Department of Chemistry, National Tsinghua University , Hsinchu 300, Taiwan
| |
Collapse
|
8
|
Stefl S, Nishi H, Petukh M, Panchenko AR, Alexov E. Molecular mechanisms of disease-causing missense mutations. J Mol Biol 2013; 425:3919-36. [PMID: 23871686 DOI: 10.1016/j.jmb.2013.07.014] [Citation(s) in RCA: 187] [Impact Index Per Article: 17.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2013] [Revised: 07/04/2013] [Accepted: 07/10/2013] [Indexed: 12/23/2022]
Abstract
Genetic variations resulting in a change of amino acid sequence can have a dramatic effect on stability, hydrogen bond network, conformational dynamics, activity and many other physiologically important properties of proteins. The substitutions of only one residue in a protein sequence, so-called missense mutations, can be related to many pathological conditions and may influence susceptibility to disease and drug treatment. The plausible effects of missense mutations range from affecting the macromolecular stability to perturbing macromolecular interactions and cellular localization. Here we review the individual cases and genome-wide studies that illustrate the association between missense mutations and diseases. In addition, we emphasize that the molecular mechanisms of effects of mutations should be revealed in order to understand the disease origin. Finally, we report the current state-of-the-art methodologies that predict the effects of mutations on protein stability, the hydrogen bond network, pH dependence, conformational dynamics and protein function.
Collapse
Affiliation(s)
- Shannon Stefl
- Computational Biophysics and Bioinformatics, Department of Physics, Clemson University, Clemson, SC 29634, USA
| | | | | | | | | |
Collapse
|
9
|
Residue mutations and their impact on protein structure and function: detecting beneficial and pathogenic changes. Biochem J 2013; 449:581-94. [DOI: 10.1042/bj20121221] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
The present review focuses on the evolution of proteins and the impact of amino acid mutations on function from a structural perspective. Proteins evolve under the law of natural selection and undergo alternating periods of conservative evolution and of relatively rapid change. The likelihood of mutations being fixed in the genome depends on various factors, such as the fitness of the phenotype or the position of the residues in the three-dimensional structure. For example, co-evolution of residues located close together in three-dimensional space can occur to preserve global stability. Whereas point mutations can fine-tune the protein function, residue insertions and deletions (‘decorations’ at the structural level) can sometimes modify functional sites and protein interactions more dramatically. We discuss recent developments and tools to identify such episodic mutations, and examine their applications in medical research. Such tools have been tested on simulated data and applied to real data such as viruses or animal sequences. Traditionally, there has been little if any cross-talk between the fields of protein biophysics, protein structure–function and molecular evolution. However, the last several years have seen some exciting developments in combining these approaches to obtain an in-depth understanding of how proteins evolve. For example, a better understanding of how structural constraints affect protein evolution will greatly help us to optimize our models of sequence evolution. The present review explores this new synthesis of perspectives.
Collapse
|
10
|
Brohée S. Using the NeAT toolbox to compare networks to networks, clusters to clusters, and network to clusters. Methods Mol Biol 2012; 804:327-342. [PMID: 22144162 DOI: 10.1007/978-1-61779-361-5_18] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
In this chapter, we present and interpret some operations on biological networks that can easily performed with NeAT, a set of Web tools aimed at studying biological networks (or graphs) and classifications. These approaches are of particular interest for biologists and scientists who need to assess the reliability of new datasets (either experimental or predicted) by comparing them to established references. Firstly, we describe the steps that will allow a nonspecialist user to compare two networks to compute their union and the statistical significance of their intersection. Next, we show how to map functional classes (e.g., GO categories, sets of regulons or complexes) onto a biological network. A third protocol explains how to compare two sets of functional classes, e.g., to assess statistically the biological relevance of some computationally returned groups of genes (clustering). The metrics as well as the results obtained by following the different protocols are extensively described and explained. NeAT is available at the following URL: http://rsat.bigre.ulb.ac.be/rsat/index_neat.html.
Collapse
Affiliation(s)
- Sylvain Brohée
- ESAT-SCD-SISTA (Bioinformatics group), Katholieke Universiteit, Leuven, Belgium.
| |
Collapse
|
11
|
Dehouck Y, Kwasigroch JM, Gilis D, Rooman M. PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality. BMC Bioinformatics 2011; 12:151. [PMID: 21569468 PMCID: PMC3113940 DOI: 10.1186/1471-2105-12-151] [Citation(s) in RCA: 367] [Impact Index Per Article: 28.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 05/13/2011] [Indexed: 12/31/2022] Open
Abstract
Background The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. Results PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. Conclusion The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
Collapse
Affiliation(s)
- Yves Dehouck
- Bioinformatique génomique et structurale, Université Libre de Bruxelles, Av, Fr, Roosevelt 50, CP165/61, 1050 Brussels, Belgium.
| | | | | | | |
Collapse
|
12
|
Yahalom R, Reshef D, Wiener A, Frankel S, Kalisman N, Lerner B, Keasar C. Structure-based identification of catalytic residues. Proteins 2011; 79:1952-63. [PMID: 21491495 DOI: 10.1002/prot.23020] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2010] [Revised: 01/14/2011] [Accepted: 01/28/2011] [Indexed: 11/10/2022]
Abstract
The identification of catalytic residues is an essential step in functional characterization of enzymes. We present a purely structural approach to this problem, which is motivated by the difficulty of evolution-based methods to annotate structural genomics targets that have few or no homologs in the databases. Our approach combines a state-of-the-art support vector machine (SVM) classifier with novel structural features that augment structural clues by spatial averaging and Z scoring. Special attention is paid to the class imbalance problem that stems from the overwhelming number of non-catalytic residues in enzymes compared to catalytic residues. This problem is tackled by: (1) optimizing the classifier to maximize a performance criterion that considers both Type I and Type II errors in the classification of catalytic and non-catalytic residues; (2) under-sampling non-catalytic residues before SVM training; and (3) during SVM training, penalizing errors in learning catalytic residues more than errors in learning non-catalytic residues. Tested on four enzyme datasets, one specifically designed by us to mimic the structural genomics scenario and three previously evaluated datasets, our structure-based classifier is never inferior to similar structure-based classifiers and comparable to classifiers that use both structural and evolutionary features. In addition to the evaluation of the performance of catalytic residue identification, we also present detailed case studies on three proteins. This analysis suggests that many false positive predictions may correspond to binding sites and other functional residues. A web server that implements the method, our own-designed database, and the source code of the programs are publicly available at http://www.cs.bgu.ac.il/∼meshi/functionPrediction.
Collapse
Affiliation(s)
- Ran Yahalom
- Department of Computer Science, Ben-Gurion University of the Negev, Beer-Sheva 84105, Israel
| | | | | | | | | | | | | |
Collapse
|
13
|
Illergård K, Kauko A, Elofsson A. Why are polar residues within the membrane core evolutionary conserved? Proteins 2010; 79:79-91. [PMID: 20938980 DOI: 10.1002/prot.22859] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2010] [Revised: 07/23/2010] [Accepted: 08/13/2010] [Indexed: 11/08/2022]
Abstract
Here, we present a study of polar residues within the membrane core of alpha-helical membrane proteins. As expected, polar residues are less frequent in the membrane than expected. Further, most of these residues are buried within the interior of the protein and are only rarely exposed to lipids. However, the polar groups often border internal water filled cavities, even if the rest of the sidechain is buried. A survey of their functional roles in known structures showed that the polar residues are often directly involved in binding of small compounds, especially in channels and transporters, but other functions including proton transfer, catalysis, and selectivity have also been attributed to these proteins. Among the polar residues histidines often interact with prosthetic groups in photosynthetic- and oxidoreductase-related proteins, whereas prolines often are required for conformational changes of the proteins. Indeed, the polar residues in the membrane core are more conserved than other residues in the core, as well as more conserved than polar residues outside the membrane. The reason is twofold; they are often (i) buried in the interior of the protein and (ii) directly involved in the function of the proteins. Finally, a method to identify which polar residues are present within the membrane core directly from protein sequences was developed. Applying the method to the set of all human membrane proteins the prediction indicates that polar residues were most frequent among active transporter proteins and GPCRs, whereas infrequent in families with few transmembrane regions, such as non-GPCR receptors.
Collapse
Affiliation(s)
- Kristoffer Illergård
- Department of Biochemistry and Biophysics, Center for Biomembrane Research, Stockholm bioinformatics Center, Stockholm University, SE-106 91 Stockholm, Sweden
| | | | | |
Collapse
|
14
|
Doppelt-Azeroual O, Delfaud F, Moriaud F, de Brevern AG. Fast and automated functional classification with MED-SuMo: an application on purine-binding proteins. Protein Sci 2010; 19:847-67. [PMID: 20162627 DOI: 10.1002/pro.364] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Ligand-protein interactions are essential for biological processes, and precise characterization of protein binding sites is crucial to understand protein functions. MED-SuMo is a powerful technology to localize similar local regions on protein surfaces. Its heuristic is based on a 3D representation of macromolecules using specific surface chemical features associating chemical characteristics with geometrical properties. MED-SMA is an automated and fast method to classify binding sites. It is based on MED-SuMo technology, which builds a similarity graph, and it uses the Markov Clustering algorithm. Purine binding sites are well studied as drug targets. Here, purine binding sites of the Protein DataBank (PDB) are classified. Proteins potentially inhibited or activated through the same mechanism are gathered. Results are analyzed according to PROSITE annotations and to carefully refined functional annotations extracted from the PDB. As expected, binding sites associated with related mechanisms are gathered, for example, the Small GTPases. Nevertheless, protein kinases from different Kinome families are also found together, for example, Aurora-A and CDK2 proteins which are inhibited by the same drugs. Representative examples of different clusters are presented. The effectiveness of the MED-SMA approach is demonstrated as it gathers binding sites of proteins with similar structure-activity relationships. Moreover, an efficient new protocol associates structures absent of cocrystallized ligands to the purine clusters enabling those structures to be associated with a specific binding mechanism. Applications of this classification by binding mode similarity include target-based drug design and prediction of cross-reactivity and therefore potential toxic side effects.
Collapse
Affiliation(s)
- Olivia Doppelt-Azeroual
- INSERM UMR-S 665, Dynamique des Structures et Interactions des Macromolécules Biologiques (DSIMB), Université Paris Diderot-Paris 7, Institut National de la Transfusion Sanguine (INTS), 6, rue Alexandre Cabanel, 75739 Paris cedex 15, France.
| | | | | | | |
Collapse
|
15
|
Brylinski M, Skolnick J. Comparison of structure-based and threading-based approaches to protein functional annotation. Proteins 2010; 78:118-34. [PMID: 19731377 PMCID: PMC2804779 DOI: 10.1002/prot.22566] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
To exploit the vast amount of sequence information provided by the Genomic revolution, the biological function of these sequences must be identified. As a practical matter, this is often accomplished by functional inference. Purely sequence-based approaches, particularly in the "twilight zone" of low sequence similarity levels, are complicated by many factors. For proteins, structure-based techniques aim to overcome these problems; however, most require high-quality crystal structures and suffer from complex and equivocal relations between protein fold and function. In this study, in extensive benchmarking, we consider a number of aspects of structure-based functional annotation: binding pocket detection, molecular function assignment and ligand-based virtual screening. We demonstrate that protein threading driven by a strong sequence profile component greatly improves the quality of purely structure-based functional annotation in the "twilight zone." By detecting evolutionarily related proteins, it considerably reduces the high false positive rate of function inference derived on the basis of global structure similarity alone. Combined evolution/structure-based function assignment emerges as a powerful technique that can make a significant contribution to comprehensive proteome annotation.
Collapse
Affiliation(s)
- Michal Brylinski
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318
| |
Collapse
|
16
|
Foit L, Morgan GJ, Kern MJ, Steimer LR, von Hacht AA, Titchmarsh J, Warriner SL, Radford SE, Bardwell JC. Optimizing protein stability in vivo. Mol Cell 2009; 36:861-71. [PMID: 20005848 PMCID: PMC2818778 DOI: 10.1016/j.molcel.2009.11.022] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2009] [Revised: 08/04/2009] [Accepted: 10/24/2009] [Indexed: 11/23/2022]
Abstract
Identifying mutations that stabilize proteins is challenging because most substitutions are destabilizing. In addition to being of immense practical utility, the ability to evolve protein stability in vivo may indicate how evolution has formed today's protein sequences. Here we describe a genetic selection that directly links the in vivo stability of proteins to antibiotic resistance. It allows the identification of stabilizing mutations within proteins. The large majority of mutants selected for improved antibiotic resistance are stabilized both thermodynamically and kinetically, indicating that similar principles govern stability in vivo and in vitro. The approach requires no prior structural or functional knowledge and allows selection for stability without a need to maintain function. Mutations that enhance thermodynamic stability of the protein Im7 map overwhelmingly to surface residues involved in binding to colicin E7, showing how the evolutionary pressures that drive Im7-E7 complex formation have compromised the stability of the isolated Im7 protein.
Collapse
Affiliation(s)
- Linda Foit
- Howard Hughes Medical Institute University of Michigan, Ann Arbor, MI 48109, USA
- Institute for Chemistry and Pharmacy, University of Münster, 48149 Münster, Germany
| | - Gareth J. Morgan
- Astbury Centre for Structural and Molecular Biology, University of Leeds, LS2 9JT, UK
- Institute for Molecular and Cellular Biology, University of Leeds, LS2 9JT, UK
| | - Maximilian J. Kern
- Howard Hughes Medical Institute University of Michigan, Ann Arbor, MI 48109, USA
| | - Lenz R. Steimer
- Howard Hughes Medical Institute University of Michigan, Ann Arbor, MI 48109, USA
| | | | - James Titchmarsh
- Astbury Centre for Structural and Molecular Biology, University of Leeds, LS2 9JT, UK
- School of Chemistry, University of Leeds, LS2 9JT UK
| | - Stuart L. Warriner
- Astbury Centre for Structural and Molecular Biology, University of Leeds, LS2 9JT, UK
- School of Chemistry, University of Leeds, LS2 9JT UK
| | - Sheena E. Radford
- Astbury Centre for Structural and Molecular Biology, University of Leeds, LS2 9JT, UK
- Institute for Molecular and Cellular Biology, University of Leeds, LS2 9JT, UK
| | - James C.A. Bardwell
- Howard Hughes Medical Institute University of Michigan, Ann Arbor, MI 48109, USA
- Department of Molecular, Cellular and Developmental Biology University of Michigan, Ann Arbor, MI 48109, USA
| |
Collapse
|
17
|
Bray T, Chan P, Bougouffa S, Greaves R, Doig AJ, Warwicker J. SitesIdentify: a protein functional site prediction tool. BMC Bioinformatics 2009; 10:379. [PMID: 19922660 PMCID: PMC2783165 DOI: 10.1186/1471-2105-10-379] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2009] [Accepted: 11/18/2009] [Indexed: 01/31/2023] Open
Abstract
Background The rate of protein structures being deposited in the Protein Data Bank surpasses the capacity to experimentally characterise them and therefore computational methods to analyse these structures have become increasingly important. Identifying the region of the protein most likely to be involved in function is useful in order to gain information about its potential role. There are many available approaches to predict functional site, but many are not made available via a publicly-accessible application. Results Here we present a functional site prediction tool (SitesIdentify), based on combining sequence conservation information with geometry-based cleft identification, that is freely available via a web-server. We have shown that SitesIdentify compares favourably to other functional site prediction tools in a comparison of seven methods on a non-redundant set of 237 enzymes with annotated active sites. Conclusion SitesIdentify is able to produce comparable accuracy in predicting functional sites to its closest available counterpart, but in addition achieves improved accuracy for proteins with few characterised homologues. SitesIdentify is available via a webserver at http://www.manchester.ac.uk/bioinformatics/sitesidentify/
Collapse
Affiliation(s)
- Tracey Bray
- Faculty of Life Sciences, The University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
| | | | | | | | | | | |
Collapse
|
18
|
Lise S, Archambeau C, Pontil M, Jones DT. Prediction of hot spot residues at protein-protein interfaces by combining machine learning and energy-based methods. BMC Bioinformatics 2009; 10:365. [PMID: 19878545 PMCID: PMC2777894 DOI: 10.1186/1471-2105-10-365] [Citation(s) in RCA: 87] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2009] [Accepted: 10/30/2009] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Alanine scanning mutagenesis is a powerful experimental methodology for investigating the structural and energetic characteristics of protein complexes. Individual amino-acids are systematically mutated to alanine and changes in free energy of binding (DeltaDeltaG) measured. Several experiments have shown that protein-protein interactions are critically dependent on just a few residues ("hot spots") at the interface. Hot spots make a dominant contribution to the free energy of binding and if mutated they can disrupt the interaction. As mutagenesis studies require significant experimental efforts, there is a need for accurate and reliable computational methods. Such methods would also add to our understanding of the determinants of affinity and specificity in protein-protein recognition. RESULTS We present a novel computational strategy to identify hot spot residues, given the structure of a complex. We consider the basic energetic terms that contribute to hot spot interactions, i.e. van der Waals potentials, solvation energy, hydrogen bonds and Coulomb electrostatics. We treat them as input features and use machine learning algorithms such as Support Vector Machines and Gaussian Processes to optimally combine and integrate them, based on a set of training examples of alanine mutations. We show that our approach is effective in predicting hot spots and it compares favourably to other available methods. In particular we find the best performances using Transductive Support Vector Machines, a semi-supervised learning scheme. When hot spots are defined as those residues for which DeltaDeltaG >or= 2 kcal/mol, our method achieves a precision and a recall respectively of 56% and 65%. CONCLUSION We have developed an hybrid scheme in which energy terms are used as input features of machine learning models. This strategy combines the strengths of machine learning and energy-based methods. Although so far these two types of approaches have mainly been applied separately to biomolecular problems, the results of our investigation indicate that there are substantial benefits to be gained by their integration.
Collapse
Affiliation(s)
- Stefano Lise
- Department of Computer Science, University College London, UK.
| | | | | | | |
Collapse
|
19
|
Abstract
Protein–DNA/RNA/protein interactions play critical roles in many biological functions. Previous studies have focused on the different features characterizing the different macromolecule-binding sites and approaches to detect these sites. However, no common unique signature of these sites had been reported. Thus, this work aims to provide a ‘common’ principle dictating the location of the different macromolecule-binding sites founded upon fundamental principles of binding thermodynamics. To achieve this aim, a comprehensive set of structurally nonhomologous DNA-, RNA-, obligate protein- and nonobligate protein-binding proteins, both free and bound to their respective macromolecules, was created and a novel strategy for detecting clusters of residues with electrostatic or steric strain given the protein structure was developed. The results show that regardless of the macromolecule type, the binding strength and conformational changes upon binding, macromolecule-binding sites are energetically less stable than nonmacromolecule-binding sites. They also reveal new energetic features distinguishing DNA- from RNA-binding sites and obligate protein- from nonobligate protein-binding sites in both free/bound protein structures.
Collapse
Affiliation(s)
- Yao Chi Chen
- Institute of Biomedical Sciences, Academia Sinica, Taipei, Taiwan
| | | |
Collapse
|
20
|
Protein meta-functional signatures from combining sequence, structure, evolution, and amino acid property information. PLoS Comput Biol 2008; 4:e1000181. [PMID: 18818722 PMCID: PMC2526173 DOI: 10.1371/journal.pcbi.1000181] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2008] [Accepted: 08/07/2008] [Indexed: 11/19/2022] Open
Abstract
Protein function is mediated by different amino acid residues, both their positions and types, in a protein sequence. Some amino acids are responsible for the stability or overall shape of the protein, playing an indirect role in protein function. Others play a functionally important role as part of active or binding sites of the protein. For a given protein sequence, the residues and their degree of functional importance can be thought of as a signature representing the function of the protein. We have developed a combination of knowledge- and biophysics-based function prediction approaches to elucidate the relationships between the structural and the functional roles of individual residues and positions. Such a meta-functional signature (MFS), which is a collection of continuous values representing the functional significance of each residue in a protein, may be used to study proteins of known function in greater detail and to aid in experimental characterization of proteins of unknown function. We demonstrate the superior performance of MFS in predicting protein functional sites and also present four real-world examples to apply MFS in a wide range of settings to elucidate protein sequence-structure-function relationships. Our results indicate that the MFS approach, which can combine multiple sources of information and also give biological interpretation to each component, greatly facilitates the understanding and characterization of protein function.
Collapse
|
21
|
Dukka BKC, Livesay DR. Improving position-specific predictions of protein functional sites using phylogenetic motifs. ACTA ACUST UNITED AC 2008; 24:2308-16. [PMID: 18723520 DOI: 10.1093/bioinformatics/btn454] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Accurate computational prediction of protein functional sites is critical to maximizing the utility of recent high-throughput sequencing efforts. Among the available approaches, position-specific conservation scores remain among the most popular due to their accuracy and ease of computation. Unfortunately, high false positive rates remain a limiting factor. Using phylogenetic motifs (PMs), we have developed two combined (conservation + PMs) prediction schemes that significantly improve prediction accuracy. RESULTS Our first approach, called position-specific MINER (psMINER), rank orders alignment columns by conservation. Subsequently, positions that are also not identified as PMs are excluded from the prediction set. This approach improves prediction accuracy, in a statistically significant way, compared to the underlying conservation scores. Increased accuracy is a general result, meaning improvement is observed over several different conservation scores that span a continuum of complexity. In addition, a hybrid MINER (hMINER) that quantitatively considers both scoring regimes provides further improvement. More importantly, it provides critical insight into the relative importance of phylogeny versus alignment conservation. Both methods outperform other common prediction algorithms that also utilize phylogenetic concepts. Finally, we demonstrate that the presented results are critically sensitive to functional site definition, thus highlighting the need for more complete benchmarks within the prediction community.
Collapse
Affiliation(s)
- Bahadur K C Dukka
- Department of Computer Science and Bioinformatics Research Center, University of North Carolina at Charlotte, Charlotte, NC 28223, USA
| | | |
Collapse
|
22
|
Fukushima K, Wada M, Sakurai M. An insight into the general relationship between the three dimensional structures of enzymes and their electronic wave functions: Implication for the prediction of functional sites of enzymes. Proteins 2008; 71:1940-54. [PMID: 18186466 DOI: 10.1002/prot.21865] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
In this study, we explored the general relationship between the three-dimensional (3D) structures of enzymes and their electronic wave functions. Furthermore, we developed a method for the prediction of their functionally important sites. For this purpose, we first performed linear-scaling molecular orbital calculations for 112 nonredundant, non-homologous enzymes with known structure and function. In consequence, we showed that the canonical molecular orbitals (MOs) of the enzymes could be classified into three groups according to the degree of electron delocalization: highly localized orbitals (Group A), highly delocalized orbitals whose electrons are distributed over almost the whole molecule (Group B), and moderately delocalized orbitals (Group C). The MOs belonging to Group A are located near the HOMO-LUMO band gap, and thereby include the frontier orbitals of a given enzyme. We inferred that the MOs of Group B play a role in stabilizing the 3D structure of the enzyme, while those of Group C contribute to constructing the covalent bond framework of the enzyme. Next, we investigated whether the frontier orbitals of enzymes could be used for identifying their potential functional sites. As a result, we found that the frontier orbitals of the 112 enzymes have a high propensity to be colocalized with the known functional sites, especially when the enzymes are hydrated. Such a propensity is shown to be remarkable when Glu or Asp is a functional site residue. On the basis of these results, we finally propose a protocol for the prediction of functional sites of enzymes.
Collapse
Affiliation(s)
- K Fukushima
- Center for Biological Resources and Informatics, Tokyo Institute of Technology, Midori-ku, Yokohama 226-8501, Japan
| | | | | |
Collapse
|
23
|
Dessailly BH, Lensink MF, Orengo CA, Wodak SJ. LigASite--a database of biologically relevant binding sites in proteins with known apo-structures. Nucleic Acids Res 2007; 36:D667-73. [PMID: 17933762 PMCID: PMC2238865 DOI: 10.1093/nar/gkm839] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Better characterization of binding sites in proteins and the ability to accurately predict their location and energetic properties are major challenges which, if addressed, would have many valuable practical applications. Unfortunately, reliable benchmark datasets of binding sites in proteins are still sorely lacking. Here, we present LigASite ('LIGand Attachment SITE'), a gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server. LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time. Both a redundant and a culled non-redundant version of the dataset is available at http://www.scmbb.ulb.ac.be/Users/benoit/LigASite. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.
Collapse
Affiliation(s)
- Benoit H Dessailly
- Center for Structural Biology and Bioinformatics, Université Libre de Bruxelles (U. L. B.), Bld du Triomphe - CP 263, 1050 Bruxelles, Belgium
| | | | | | | |
Collapse
|