1
|
Allosteric Modulation of Binding Specificity by Alternative Packing of Protein Cores. J Mol Biol 2018; 431:336-350. [PMID: 30471255 DOI: 10.1016/j.jmb.2018.11.018] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2018] [Revised: 11/04/2018] [Accepted: 11/14/2018] [Indexed: 11/21/2022]
Abstract
Hydrophobic cores are often viewed as tightly packed and rigid, but they do show some plasticity and could thus be attractive targets for protein design. Here we explored the role of different functional pressures on the core packing and ligand recognition of the SH3 domain from human Fyn tyrosine kinase. We randomized the hydrophobic core and used phage display to select variants that bound to each of three distinct ligands. The three evolved groups showed remarkable differences in core composition, illustrating the effect of different selective pressures on the core. Changes in the core did not significantly alter protein stability, but were linked closely to changes in binding affinity and specificity. Structural analysis and molecular dynamics simulations revealed the structural basis for altered specificity. The evolved domains had significantly reduced core volumes, which in turn induced increased backbone flexibility. These motions were propagated from the core to the binding surface and induced significant conformational changes. These results show that alternative core packing and consequent allosteric modulation of binding interfaces could be used to engineer proteins with novel functions.
Collapse
|
2
|
Xiao X, Agris PF, Hall CK. Introducing folding stability into the score function for computational design of RNA-binding peptides boosts the probability of success. Proteins 2016; 84:700-11. [PMID: 26914059 DOI: 10.1002/prot.25021] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2015] [Revised: 01/26/2016] [Accepted: 02/10/2016] [Indexed: 12/30/2022]
Abstract
A computational strategy that integrates our peptide search algorithm with atomistic molecular dynamics simulation was used to design rational peptide drugs that recognize and bind to the anticodon stem and loop domain (ASL(Lys3)) of human tRNAUUULys3 for the purpose of interrupting HIV replication. The score function of the search algorithm was improved by adding a peptide stability term weighted by an adjustable factor λ to the peptide binding free energy. The five best peptide sequences associated with five different values of λ were determined using the search algorithm and then input in atomistic simulations to examine the stability of the peptides' folded conformations and their ability to bind to ASL(Lys3). Simulation results demonstrated that setting an intermediate value of λ achieves a good balance between optimizing the peptide's binding ability and stabilizing its folded conformation during the sequence evolution process, and hence leads to optimal binding to the target ASL(Lys3). Thus, addition of a peptide stability term significantly improves the success rate for our peptide design search.
Collapse
Affiliation(s)
- Xingqing Xiao
- Chemical and Biomolecular Engineering Department, North Carolina State University, Raleigh, North Carolina, 27695-7905
| | - Paul F Agris
- The RNA Institute, University at Albany, State University of New York, Albany, New York, 12222
| | - Carol K Hall
- Chemical and Biomolecular Engineering Department, North Carolina State University, Raleigh, North Carolina, 27695-7905
| |
Collapse
|
3
|
Tian Y, Huang X, Zhu Y. Computational design of enzyme-ligand binding using a combined energy function and deterministic sequence optimization algorithm. J Mol Model 2015; 21:191. [PMID: 26162695 DOI: 10.1007/s00894-015-2742-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2015] [Accepted: 06/24/2015] [Indexed: 01/06/2023]
Abstract
Enzyme amino-acid sequences at ligand-binding interfaces are evolutionarily optimized for reactions, and the natural conformation of an enzyme-ligand complex must have a low free energy relative to alternative conformations in native-like or non-native sequences. Based on this assumption, a combined energy function was developed for enzyme design and then evaluated by recapitulating native enzyme sequences at ligand-binding interfaces for 10 enzyme-ligand complexes. In this energy function, the electrostatic interaction between polar or charged atoms at buried interfaces is described by an explicitly orientation-dependent hydrogen-bonding potential and a pairwise-decomposable generalized Born model based on the general side chain in the protein design framework. The energy function is augmented with a pairwise surface-area based hydrophobic contribution for nonpolar atom burial. Using this function, on average, 78% of the amino acids at ligand-binding sites were predicted correctly in the minimum-energy sequences, whereas 84% were predicted correctly in the most-similar sequences, which were selected from the top 20 sequences for each enzyme-ligand complex. Hydrogen bonds at the enzyme-ligand binding interfaces in the 10 complexes were usually recovered with the correct geometries. The binding energies calculated using the combined energy function helped to discriminate the active sequences from a pool of alternative sequences that were generated by repeatedly solving a series of mixed-integer linear programming problems for sequence selection with increasing integer cuts.
Collapse
Affiliation(s)
- Ye Tian
- Department of Chemical Engineering, Tsinghua University, Beijing, 100084, People's Republic of China
| | | | | |
Collapse
|
4
|
Gaillard T, Simonson T. Pairwise decomposition of an MMGBSA energy function for computational protein design. J Comput Chem 2014; 35:1371-87. [PMID: 24854675 DOI: 10.1002/jcc.23637] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/14/2014] [Accepted: 05/01/2014] [Indexed: 02/02/2023]
Abstract
Computational protein design (CPD) aims at predicting new proteins or modifying existing ones. The computational challenge is huge as it requires exploring an enormous sequence and conformation space. The difficulty can be reduced by considering a fixed backbone and a discrete set of sidechain conformations. Another common strategy consists in precalculating a pairwise energy matrix, from which the energy of any sequence/conformation can be quickly obtained. In this work, we examine the pairwise decomposition of protein MMGBSA energy functions from a general theoretical perspective, and an implementation proposed earlier for CPD. It includes a Generalized Born term, whose many-body character is overcome using an effective dielectric environment, and a Surface Area term, for which we present an improved pairwise decomposition. A detailed evaluation of the error introduced by the decomposition on the different energy components is performed. We show that the error remains reasonable, compared to other uncertainties.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
5
|
Smadbeck J, Peterson MB, Khoury GA, Taylor MS, Floudas CA. Protein WISDOM: a workbench for in silico de novo design of biomolecules. J Vis Exp 2013. [PMID: 23912941 DOI: 10.3791/50476] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open
Abstract
The aim of de novo protein design is to find the amino acid sequences that will fold into a desired 3-dimensional structure with improvements in specific properties, such as binding affinity, agonist or antagonist behavior, or stability, relative to the native sequence. Protein design lies at the center of current advances drug design and discovery. Not only does protein design provide predictions for potentially useful drug targets, but it also enhances our understanding of the protein folding process and protein-protein interactions. Experimental methods such as directed evolution have shown success in protein design. However, such methods are restricted by the limited sequence space that can be searched tractably. In contrast, computational design strategies allow for the screening of a much larger set of sequences covering a wide variety of properties and functionality. We have developed a range of computational de novo protein design methods capable of tackling several important areas of protein design. These include the design of monomeric proteins for increased stability and complexes for increased binding affinity. To disseminate these methods for broader use we present Protein WISDOM (http://www.proteinwisdom.org), a tool that provides automated methods for a variety of protein design problems. Structural templates are submitted to initialize the design process. The first stage of design is an optimization sequence selection stage that aims at improving stability through minimization of potential energy in the sequence space. Selected sequences are then run through a fold specificity stage and a binding affinity stage. A rank-ordered list of the sequences for each step of the process, along with relevant designed structures, provides the user with a comprehensive quantitative assessment of the design. Here we provide the details of each design method, as well as several notable experimental successes attained through the use of the methods.
Collapse
Affiliation(s)
- James Smadbeck
- Department of Chemical and Biological Engineering, Princeton University, USA
| | | | | | | | | |
Collapse
|
6
|
Sandhya S, Mudgal R, Jayadev C, Abhinandan KR, Sowdhamini R, Srinivasan N. Cascaded walks in protein sequence space: use of artificial sequences in remote homology detection between natural proteins. MOLECULAR BIOSYSTEMS 2012; 8:2076-84. [PMID: 22692068 DOI: 10.1039/c2mb25113b] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Over the past two decades, many ingenious efforts have been made in protein remote homology detection. Because homologous proteins often diversify extensively in sequence, it is challenging to demonstrate such relatedness through entirely sequence-driven searches. Here, we describe a computational method for the generation of 'protein-like' sequences that serves to bridge gaps in protein sequence space. Sequence profile information, as embodied in a position-specific scoring matrix of multiply aligned sequences of bona fide family members, serves as the starting point in this algorithm. The observed amino acid propensity and the selection of a random number dictate the selection of a residue for each position in the sequence. In a systematic manner, and by applying a 'roulette-wheel' selection approach at each position, we generate parent family-like sequences and thus facilitate an enlargement of sequence space around the family. When generated for a large number of families, we demonstrate that they expand the utility of natural intermediately related sequences in linking distant proteins. In 91% of the assessed examples, inclusion of designed sequences improved fold coverage by 5-10% over searches made in their absence. Furthermore, with several examples from proteins adopting folds such as TIM, globin, lipocalin and others, we demonstrate that the success of including designed sequences in a database positively sensitized methods such as PSI-BLAST and Cascade PSI-BLAST and is a promising opportunity for enormously improved remote homology recognition using sequence information alone.
Collapse
Affiliation(s)
- S Sandhya
- National Centre for Biological Sciences, UAS-GKVK Campus, Bangalore 560065, India
| | | | | | | | | | | |
Collapse
|
7
|
Bellows ML, Taylor MS, Cole PA, Shen L, Siliciano RF, Fung HK, Floudas CA. Discovery of entry inhibitors for HIV-1 via a new de novo protein design framework. Biophys J 2011; 99:3445-53. [PMID: 21081094 DOI: 10.1016/j.bpj.2010.09.050] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 09/23/2010] [Accepted: 09/27/2010] [Indexed: 12/11/2022] Open
Abstract
A new (to our knowledge) de novo design framework with a ranking metric based on approximate binding affinity calculations is introduced and applied to the discovery of what we believe are novel HIV-1 entry inhibitors. The framework consists of two stages: a sequence selection stage and a validation stage. The sequence selection stage produces a rank-ordered list of amino-acid sequences by solving an integer programming sequence selection model. The validation stage consists of fold specificity and approximate binding affinity calculations. The designed peptidic inhibitors are 12-amino-acids-long and target the hydrophobic core of gp41. A number of the best-predicted sequences were synthesized and their inhibition of HIV-1 was tested in cell culture. All peptides examined showed inhibitory activity when compared with no drug present, and the novel peptide sequences outperformed the native template sequence used for the design. The best sequence showed micromolar inhibition, which is a 3-15-fold improvement over the native sequence, depending on the donor. In addition, the best sequence equally inhibited wild-type and Enfuvirtide-resistant virus strains.
Collapse
Affiliation(s)
- M L Bellows
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | | | | | | | | | | | | |
Collapse
|
8
|
Bazzoli A, Tettamanzi AGB, Zhang Y. Computational protein design and large-scale assessment by I-TASSER structure assembly simulations. J Mol Biol 2011; 407:764-76. [PMID: 21329699 DOI: 10.1016/j.jmb.2011.02.017] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2010] [Revised: 01/30/2011] [Accepted: 02/05/2011] [Indexed: 10/18/2022]
Abstract
Protein design aims at designing new protein molecules of desired structure and functionality. One of the major obstacles to large-scale protein design are the extensive time and manpower requirements for experimental validation of designed sequences. Recent advances in protein structure prediction have provided potentials for an automated assessment of the designed sequences via folding simulations. We present a new protocol for protein design and validation. The sequence space is initially searched by Monte Carlo sampling guided by a public atomic potential, with candidate sequences selected by the clustering of sequence decoys. The designed sequences are then assessed by I-TASSER folding simulations, which generate full-length atomic structural models by the iterative assembly of threading fragments. The protocol is tested on 52 nonhomologous single-domain proteins, with an average sequence identity of 24% between the designed sequences and the native sequences. Despite this low sequence identity, three-dimensional models predicted for the first designed sequence have an RMSD of <2 Å to the target structure in 62% of cases. This percentage increases to 77% if we consider the three-dimensional models from the top 10 designed sequences. Such a striking consistency between the target structure and the structural prediction from nonhomologous sequences, despite the fact that the design and folding algorithms adopt completely different force fields, indicates that the design algorithm captures the features essential to the global fold of the target. On average, the designed sequences have a free energy that is 0.39 kcal/(mol residue) lower than in the native sequences, potentially affording a greater stability to synthesized target folds.
Collapse
Affiliation(s)
- Andrea Bazzoli
- Center for Computational Medicine and Bioinformatics, University of Michigan, 100 Washtenaw Avenue, Ann Arbor, MI 48109-2218, USA
| | | | | |
Collapse
|
9
|
Lakner C, Holder MT, Goldman N, Naylor GJP. What's in a Likelihood? Simple Models of Protein Evolution and the Contribution of Structurally Viable Reconstructions to the Likelihood. Syst Biol 2011; 60:161-74. [DOI: 10.1093/sysbio/syq088] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Affiliation(s)
- Clemens Lakner
- Department of Biological Science, Section of Ecology and Evolution
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306-4120, USA
| | - Mark T. Holder
- Department of Ecology and Evolution, University of Kansas, 6031 Haworth, 1200 Sunnyside Avenue, Lawrence, KS 66045
| | - Nick Goldman
- European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Gavin J. P. Naylor
- Department of Scientific Computing, Florida State University, Tallahassee, FL 32306-4120, USA
| |
Collapse
|
10
|
Kumar A, Cowen L. Recognition of beta-structural motifs using hidden Markov models trained with simulated evolution. Bioinformatics 2010; 26:i287-93. [PMID: 20529918 PMCID: PMC2881384 DOI: 10.1093/bioinformatics/btq199] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open
Abstract
Motivation: One of the most successful methods to date for recognizing protein sequences that are evolutionarily related, has been profile hidden Markov models. However, these models do not capture pairwise statistical preferences of residues that are hydrogen bonded in β-sheets. We thus explore methods for incorporating pairwise dependencies into these models. Results: We consider the remote homology detection problem for β-structural motifs. In particular, we ask if a statistical model trained on members of only one family in a SCOP β-structural superfamily, can recognize members of other families in that superfamily. We show that HMMs trained with our pairwise model of simulated evolution achieve nearly a median 5% improvement in AUC for β-structural motif recognition as compared to ordinary HMMs. Availability: All datasets and HMMs are available at: http://bcb.cs.tufts.edu/pairwise/ Contact:anoop.kumar@tufts.edu; lenore.cowen@tufts.edu
Collapse
Affiliation(s)
- Anoop Kumar
- Department of Computer Science, Tufts University, Medford, MA, USA.
| | | |
Collapse
|
11
|
Schmidt am Busch M, Sedano A, Simonson T. Computational protein design: validation and possible relevance as a tool for homology searching and fold recognition. PLoS One 2010; 5:e10410. [PMID: 20463972 PMCID: PMC2864755 DOI: 10.1371/journal.pone.0010410] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2009] [Accepted: 03/31/2010] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Protein fold recognition usually relies on a statistical model of each fold; each model is constructed from an ensemble of natural sequences belonging to that fold. A complementary strategy may be to employ sequence ensembles produced by computational protein design. Designed sequences can be more diverse than natural sequences, possibly avoiding some limitations of experimental databases. METHODOLOGY/PRINCIPAL FINDINGS WE EXPLORE THIS STRATEGY FOR FOUR SCOP FAMILIES: Small Kunitz-type inhibitors (SKIs), Interleukin-8 chemokines, PDZ domains, and large Caspase catalytic subunits, represented by 43 structures. An automated procedure is used to redesign the 43 proteins. We use the experimental backbones as fixed templates in the folded state and a molecular mechanics model to compute the interaction energies between sidechain and backbone groups. Calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is used to scan the sequence and conformational space, yielding 200,000-300,000 sequences per backbone template. The results confirm and generalize our earlier study of SH2 and SH3 domains. The designed sequences ressemble moderately-distant, natural homologues of the initial templates; e.g., the SUPERFAMILY, profile Hidden-Markov Model library recognizes 85% of the low-energy sequences as native-like. Conversely, Position Specific Scoring Matrices derived from the sequences can be used to detect natural homologues within the SwissProt database: 60% of known PDZ domains are detected and around 90% of known SKIs and chemokines. Energy components and inter-residue correlations are analyzed and ways to improve the method are discussed. CONCLUSIONS/SIGNIFICANCE For some families, designed sequences can be a useful complement to experimental ones for homologue searching. However, improved tools are needed to extract more information from the designed profiles before the method can be of general use.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Audrey Sedano
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, France
| |
Collapse
|
12
|
De novo self-assembling collagen heterotrimers using explicit positive and negative design. Biochemistry 2010; 49:2307-16. [PMID: 20170197 DOI: 10.1021/bi902077d] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
We sought to computationally design model collagen peptides that specifically associate as heterotrimers. Computational design has been successfully applied to the creation of new protein folds and functions. Despite the high abundance of collagen and its key role in numerous biological processes, fibrous proteins have received little attention as computational design targets. Collagens are composed of three polypeptide chains that wind into triple helices. We developed a discrete computational model to design heterotrimer-forming collagen-like peptides. Stability and specificity of oligomerization were concurrently targeted using a combined positive and negative design approach. The sequences of three 30-residue peptides, A, B, and C, were optimized to favor charge-pair interactions in an ABC heterotrimer, while disfavoring the 26 competing oligomers (i.e., AAA, ABB, BCA). Peptides were synthesized and characterized for thermal stability and triple-helical structure by circular dichroism and NMR. A unique A:B:C-type species was not achieved. Negative design was partially successful, with only A + B and B + C competing mixtures formed. Analysis of computed versus experimental stabilities helps to clarify the role of electrostatics and secondary-structure propensities determining collagen stability and to provide important insight into how subsequent designs can be improved.
Collapse
|
13
|
Thomas A, Joris B, Brasseur R. Standardized evaluation of protein stability. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1265-71. [PMID: 20176144 DOI: 10.1016/j.bbapap.2010.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 01/24/2010] [Accepted: 02/10/2010] [Indexed: 11/25/2022]
Abstract
We compare mean force potential values of a large series of PDB models of proteins and peptides and find that, either as monomers or polymers, proteins longer than 200-250 residues have equivalent MFP values that are averaged to -65+/-3 kcal/aa. This value is named the standard or stability value. The standard value is reached irrespective of sequences and 3D folds. Peptides are too short to follow the rule and frequently exist as populations of conformers; one exception is peptides in amyloid fibrils. Fibrils surpass the standard value in accordance with their uppermost stability. In parallel, we calculate median MFP values of amino acids in stably folded PDB models of proteins: median values vary from -25 for Gly to -115 kcal/aa for Trp. These median values are used to score primary sequences of proteins: all sequences converge to a mean value of -63.5+/-2.5 kcal/aa, i.e., only 1.5 kcal less than the folded model standard. Sequences from unfolded proteins have lower values. This supports the conclusion that sequences carry in an important message and more specifically that diversity of amino acids in sequences is mandatory for stability. We also use the median amino acid MFP to score residue stability in 3D folds. This demonstrates that 3D folds are compromises between fragments of high and fragments of low scores and that functional residues are often but not always in the extreme score values. The approach opens to possibilities of evaluating any 3D model and of detecting functional residues and should help in conducting mutation assays.
Collapse
Affiliation(s)
- Annick Thomas
- CBMN, Gembloux AgroBiotech, ULg, 5030 Gembloux, Belgium.
| | | | | |
Collapse
|
14
|
am Busch MS, Mignon D, Simonson T. Computational protein design as a tool for fold recognition. Proteins 2009; 77:139-58. [PMID: 19408297 DOI: 10.1002/prot.22426] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Computationally designed protein sequences have been proposed as a basis to perform fold recognition and homology searching. To investigate this possibility, an automated procedure is used to completely redesign 24 SH3 proteins and 22 SH2 proteins. We use the experimental backbone coordinates as fixed templates in the folded state and a molecular mechanics model to compute the pairwise interaction energies between all sidechain types and conformations. Energy calculations are done with the Proteins@Home volunteer computing platform. A heuristic algorithm is then used to scan the sequence and conformational space for optimal solutions. We produced 200,000-450,000 sequences for each backbone template. The designed sequences ressemble moderately-distant, natural homologues of the initial templates, according to their identity scores and their similarity with respect to the Pfam sets of SH2 and SH3 domains. Standard homology detection tools document their native-like character: the Conserved Domain Database recognizes 61% (52%) of our low-energy sequences as SH3 (SH2) domains; the SUPERFAMILY, Hidden-Markov Model library recognizes 81% (84%). Conversely, position specific scoring matrices (PSSMs) derived from our designed sequences can be used to detect natural homologues in sequence databases. Within SwissProt, a set of natural SH3 PSSMs detects 772 SH3 domains, for example; our designed PSSMs detect 67% of these, plus one additional sequence and two false positives. If six amino acids involved in substrate binding (a selective pressure not accounted for in our design) are reset to their experimental types, then 77% of the experimental SH3 domains are detected. Results for the SH2 domains are similar. Several directions to improve the method further are discussed.
Collapse
Affiliation(s)
- Marcel Schmidt am Busch
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | | | | |
Collapse
|
15
|
Babor M, Kortemme T. Multi-constraint computational design suggests that native sequences of germline antibody H3 loops are nearly optimal for conformational flexibility. Proteins 2009; 75:846-58. [PMID: 19194863 DOI: 10.1002/prot.22293] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The limited size of the germline antibody repertoire has to recognize a far larger number of potential antigens. The ability of a single antibody to bind multiple ligands due to conformational flexibility in the antigen-binding site can significantly enlarge the repertoire. Among the six complementarity determining regions (CDRs) that generally comprise the binding site, the CDR H3 loop is particularly variable. Computational protein design studies showed that predicted low energy sequences compatible with a given backbone structure often have considerable similarity to the corresponding native sequences of naturally occurring proteins, indicating that native protein sequences are close to optimal for their structures. Here, we take a step forward to determine whether conformational flexibility, believed to play a key functional role in germline antibodies, is also central in shaping their native sequence. In particular, we use a multi-constraint computational design strategy, along with the Rosetta scoring function, to propose that the native sequences of CDR H3 loops from germline antibodies are nearly optimal for conformational flexibility. Moreover, we find that antibody maturation may lead to sequences with a higher degree of optimization for a single conformation, while disfavoring sequences that are intrinsically flexible. In addition, this computational strategy allows us to predict mutations in the CDR H3 loop to stabilize the antigen-bound conformation, a computational mimic of affinity maturation, that may increase antigen binding affinity by preorganizing the antigen binding loop. In vivo affinity maturation data are consistent with our predictions. The method described here can be useful to design antibodies with higher selectivity and affinity by reducing conformational diversity.
Collapse
Affiliation(s)
- Mariana Babor
- California Institute for Quantitative Biosciences, University of California San Francisco, San Francisco, California 94158-2330, USA
| | | |
Collapse
|
16
|
Jha AN, Ananthasuresh GK, Vishveshwara S. A search for energy minimized sequences of proteins. PLoS One 2009; 4:e6684. [PMID: 19690619 PMCID: PMC2724685 DOI: 10.1371/journal.pone.0006684] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 07/23/2009] [Indexed: 11/21/2022] Open
Abstract
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - G. K. Ananthasuresh
- Department of Mechanical Engineering, Indian Institute of Science, Bangalore, India
- * E-mail: (SV); (GKA)
| | - Saraswathi Vishveshwara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- * E-mail: (SV); (GKA)
| |
Collapse
|
17
|
Prediction of protein-protein interface sequence diversity using flexible backbone computational protein design. Structure 2009; 16:1777-88. [PMID: 19081054 DOI: 10.1016/j.str.2008.09.012] [Citation(s) in RCA: 71] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2008] [Revised: 09/26/2008] [Accepted: 09/30/2008] [Indexed: 11/21/2022]
Abstract
A major challenge in computational protein design is to identify functional sequences as top predictions. One reason for design failures is conformational plasticity, as proteins frequently change their conformation in response to mutations. To advance protein design, here we describe a method employing flexible backbone ensembles to predict sequences tolerated for a protein-protein interface. We show that the predictions are enriched in functional proteins when compared to a phage display screen quantitatively mapping the energy landscape for the interaction between human growth hormone and its receptor. Our model for structural plasticity is inspired by coupled side chain-backbone "backrub" motions observed in high-resolution protein crystal structures. Although the modeled structural changes are subtle, our results on predicting sequence plasticity suggest that backrub sampling may capture a sizable fraction of localized conformational changes that occur in proteins. The described method has implications for predicting sequence libraries to enable challenging protein engineering problems.
Collapse
|
18
|
Sciretti D, Bruscolini P, Pelizzola A, Pretti M, Jaramillo A. Computational protein design with side-chain conformational entropy. Proteins 2009; 74:176-91. [PMID: 18618711 DOI: 10.1002/prot.22145] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recent advances in modeling protein structures at the atomic level have made it possible to tackle "de novo" computational protein design. Most procedures are based on combinatorial optimization using a scoring function that estimates the folding free energy of a protein sequence on a given main-chain structure. However, the computation of the conformational entropy in the folded state is generally an intractable problem, and its contribution to the free energy is not properly evaluated. In this article, we propose a new automated protein design methodology that incorporates such conformational entropy based on statistical mechanics principles. We define the free energy of a protein sequence by the corresponding partition function over rotamer states. The free energy is written in variational form in a pairwise approximation and minimized using the Belief Propagation algorithm. In this way, a free energy is associated to each amino acid sequence: we use this insight to rescore the results obtained with a standard minimization method, with the energy as the cost function. Then, we set up a design method that directly uses the free energy as a cost function in combination with a stochastic search in the sequence space. We validate the methods on the design of three superficial sites of a small SH3 domain, and then apply them to the complete redesign of 27 proteins. Our results indicate that accounting for entropic contribution in the score function affects the outcome in a highly nontrivial way, and might improve current computational design techniques based on protein stability.
Collapse
Affiliation(s)
- Daniele Sciretti
- Departamento de Física Teórica, Universidad de Zaragoza, c. Pedro Cerbuna 12, Zaragoza 50009, Spain
| | | | | | | | | |
Collapse
|
19
|
Evaluating and optimizing computational protein design force fields using fixed composition-based negative design. Proc Natl Acad Sci U S A 2008; 105:12242-7. [PMID: 18708527 DOI: 10.1073/pnas.0805858105] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An accurate force field is essential to computational protein design and protein fold prediction studies. Proper force field tuning is problematic, however, due in part to the incomplete modeling of the unfolded state. Here, we evaluate and optimize a protein design force field by constraining the amino acid composition of the designed sequences to that of a well behaved model protein. According to the random energy model, unfolded state energies are dependent only on amino acid composition and not the specific arrangement of amino acids. Therefore, energy discrepancies between computational predictions and experimental results, for sequences of identical composition, can be directly attributed to flaws in the force field's ability to properly account for folded state sequence energies. This aspect of fixed composition design allows for force field optimization by focusing solely on the interactions in the folded state. Several rounds of fixed composition optimization of the 56-residue beta1 domain of protein G yielded force field parameters with significantly greater predictive power: Optimized sequences exhibited higher wild-type sequence identity in critical regions of the structure, and the wild-type sequence showed an improved Z-score. Experimental studies revealed a designed 24-fold mutant to be stably folded with a melting temperature similar to that of the wild-type protein. Sequence designs using engrailed homeodomain as a scaffold produced similar results, suggesting the tuned force field parameters were not specific to protein G.
Collapse
|
20
|
Fung HK, Welsh WJ, Floudas CA. Computational De Novo Peptide and Protein Design: Rigid Templates versus Flexible Templates. Ind Eng Chem Res 2008. [DOI: 10.1021/ie071286k] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Ho Ki Fung
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| | - William J. Welsh
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| | - Christodoulos A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| |
Collapse
|
21
|
Mutations affecting the oligomerization interface of G-protein-coupled receptors revealed by a novel de novo protein design framework. Biophys J 2008; 94:2470-81. [PMID: 18178645 DOI: 10.1529/biophysj.107.117622] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Specific functional and pharmacological properties have recently been ascribed to G-protein-coupled receptor (GPCR) dimers/oligomers. Because the association of two identical or two distinct GPCR monomers seems to be required to elicit receptor function, it is necessary to understand the exact nature of this interaction. We present here a novel method for de novo protein design and its application to the prediction of mutations that can stabilize or destabilize a GPCR dimer while maintaining the monomer's native fold. To test the efficacy of this new method, the dimer of the single-spanned transmembrane domain of glycophorin A was used as a model system. Experimental data from mutagenesis of the helix-helix interface are compared with computational predictions at that interface, and the model's results are found to be consistent with the experimental findings. A flexible template was developed for the rhodopsin homodimer at atomic resolution and used to predict sets of three and five mutations. The results are found to be consistent across eight case studies, with favored mutations at each position. Mutation sets predicted to be the most disruptive at the dimerization interface are found to be less specific to the flexible template than sets predicted to be less disruptive.
Collapse
|
22
|
Schmidt Am Busch M, Lopes A, Mignon D, Simonson T. Computational protein design: Software implementation, parameter optimization, and performance of a simple model. J Comput Chem 2008; 29:1092-102. [DOI: 10.1002/jcc.20870] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
|
23
|
Abstract
Over the past 10 years there has been tremendous success in the area of computational protein design. Protein design software has been used to stabilize proteins, solubilize membrane proteins, design intermolecular interactions, and design new protein structures. A key motivation for these studies is that they test our understanding of protein energetics and structure. De novo design of novel structures is a particularly rigorous test because the protein backbone must be designed in addition to the amino acid side chains. A priori it is not guaranteed that the target backbone is even designable. To address this issue, researchers have developed a variety of methods for generating protein-like scaffolds and for optimizing the protein backbone in conjunction with the amino acid sequence. These protocols have been used to design proteins from scratch and to explore sequence space for naturally occurring protein folds.
Collapse
Affiliation(s)
- Glenn L Butterfoss
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA.
| | | |
Collapse
|
24
|
Hoang TX, Marsella L, Trovato A, Seno F, Banavar JR, Maritan A. Common attributes of native-state structures of proteins, disordered proteins, and amyloid. Proc Natl Acad Sci U S A 2006; 103:6883-8. [PMID: 16624879 PMCID: PMC1458988 DOI: 10.1073/pnas.0601824103] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2005] [Indexed: 11/18/2022] Open
Abstract
We show that a framework derived from the common character of globular proteins can be used to understand the design of protein sequences, the behavior of intrinsically unstructured proteins, and the formation of amyloid fibrils in a unified manner. Our studies provide compelling support for the idea that protein native-state structures, the structures adopted by intrinsically unstructured proteins on binding as well as those of amyloid aggregates, all reside in a physical state of matter in which the free energy landscape is sculpted not by the specific sequence of amino acids, but rather by considerations of geometry and symmetry. We elucidate the key role played by sequence design in selecting the structure of choice from the predetermined menu of putative native-state structures.
Collapse
Affiliation(s)
- Trinh X. Hoang
- *Institute of Physics and Electronics, Vietnamese Academy of Science and Technology, 10 Dao Tan, Hanoi, Vietnam
| | - Luca Marsella
- International School for Advanced Studies (SISSA), Via Beirut 2-4, I-34014 Trieste, Italy
- Dipartimento di Fisica “G. Galilei,” Università di Padova, Via Marzolo 8, I-35131 Padova, Italy
| | - Antonio Trovato
- Dipartimento di Fisica “G. Galilei,” Università di Padova, Via Marzolo 8, I-35131 Padova, Italy
- Sezione Istituto Nazionale di Fisica Nucleare, Università di Padova, I-35131 Padova, Italy; and
| | - Flavio Seno
- Dipartimento di Fisica “G. Galilei,” Università di Padova, Via Marzolo 8, I-35131 Padova, Italy
- Sezione Istituto Nazionale di Fisica Nucleare, Università di Padova, I-35131 Padova, Italy; and
| | - Jayanth R. Banavar
- Department of Physics, 104 Davey Lab, Pennsylvania State University, University Park, PA 16802
| | - Amos Maritan
- Dipartimento di Fisica “G. Galilei,” Università di Padova, Via Marzolo 8, I-35131 Padova, Italy
- Sezione Istituto Nazionale di Fisica Nucleare, Università di Padova, I-35131 Padova, Italy; and
| |
Collapse
|
25
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
26
|
Sandhya S, Chakrabarti S, Abhinandan KR, Sowdhamini R, Srinivasan N. Assessment of a rigorous transitive profile based search method to detect remotely similar proteins. J Biomol Struct Dyn 2005; 23:283-98. [PMID: 16218755 DOI: 10.1080/07391102.2005.10507066] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
Profile-based sequence search procedures are commonly employed to detect remote relationships between proteins. We provide an assessment of a Cascade PSI-BLAST protocol that rigorously employs intermediate sequences in detecting remote relationships between proteins. In this approach we detect using PSI-BLAST, which involves multiple rounds of iteration, an initial set of homologues for a protein in a 'first generation' search by querying a database. We propagate a 'second generation' search in the database, involving multiple runs of PSI-BLAST using each of the homologues identified in the previous generation as queries to recognize homologues not detected earlier. This non-directed search process can be viewed as an iteration of iterations that is continued to detect further homologues until no new hits are detectable. We present an assessment of the coverage of this 'cascaded' intermediate sequence search on diverse folds and find that searches for up to three generations detect most known homologues of a query. Our assessments show that this approach appears to perform better than the traditional use of PSI-BLAST by detecting 15% more relationships within a family and 35% more relationships within a superfamily. We show that such searches can be performed on generalized sequence databases and non-trivial relationships between proteins can be detected effectively. Such a propagation of searches maximizes the chances of detecting distant homologies by effectively scanning protein "fold space".
Collapse
Affiliation(s)
- S Sandhya
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560 012, India
| | | | | | | | | |
Collapse
|
27
|
Abstract
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.
Collapse
Affiliation(s)
- Yu Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | |
Collapse
|
28
|
Li X, Liang J. Geometric cooperativity and anticooperativity of three-body interactions in native proteins. Proteins 2005; 60:46-65. [PMID: 15849756 DOI: 10.1002/prot.20438] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Characterizing multibody interactions of hydrophobic, polar, and ionizable residues in protein is important for understanding the stability of protein structures. We introduce a geometric model for quantifying 3-body interactions in native proteins. With this model, empirical propensity values for many types of 3-body interactions can be reliably estimated from a database of native protein structures, despite the overwhelming presence of pairwise contacts. In addition, we define a nonadditive coefficient that characterizes cooperativity and anticooperativity of residue interactions in native proteins by measuring the deviation of 3-body interactions from 3 independent pairwise interactions. It compares the 3-body propensity value from what would be expected if only pairwise interactions were considered, and highlights the distinction of propensity and cooperativity of 3-body interaction. Based on the geometric model, and what can be inferred from statistical analysis of such a model, we find that hydrophobic interactions and hydrogen-bonding interactions make nonadditive contributions to protein stability, but the nonadditive nature depends on whether such interactions are located in the protein interior or on the protein surface. When located in the interior, many hydrophobic interactions such as those involving alkyl residues are anticooperative. Salt-bridge and regular hydrogen-bonding interactions, such as those involving ionizable residues and polar residues, are cooperative. When located on the protein surface, these salt-bridge and regular hydrogen-bonding interactions are anticooperative, and hydrophobic interactions involving alkyl residues become cooperative. We show with examples that incorporating 3-body interactions improves discrimination of protein native structures against decoy conformations. In addition, analysis of cooperative 3-body interaction may reveal spatial motifs that can suggest specific protein functions.
Collapse
Affiliation(s)
- Xiang Li
- Department of Bioengineering, SEO, MC-063, University of Illinois at Chicago, Chicago, Illinois 60607-7052, USA
| | | |
Collapse
|
29
|
Park S, Kono H, Wang W, Boder ET, Saven JG. Progress in the development and application of computational methods for probabilistic protein design. Comput Chem Eng 2005. [DOI: 10.1016/j.compchemeng.2004.07.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
30
|
Saunders CT, Baker D. Recapitulation of protein family divergence using flexible backbone protein design. J Mol Biol 2005; 346:631-44. [PMID: 15670610 DOI: 10.1016/j.jmb.2004.11.062] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2004] [Revised: 11/18/2004] [Accepted: 11/22/2004] [Indexed: 11/30/2022]
Abstract
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure.
Collapse
Affiliation(s)
- Christopher T Saunders
- Department of Genome Sciences, University of Washington, Box 357730, Seattle, WA 98195, USA
| | | |
Collapse
|
31
|
Floudas CA. Research challenges, opportunities and synergism in systems engineering and computational biology. AIChE J 2005. [DOI: 10.1002/aic.10620] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
32
|
Jaramillo A, Wodak SJ. Computational protein design is a challenge for implicit solvation models. Biophys J 2005; 88:156-71. [PMID: 15377512 PMCID: PMC1304995 DOI: 10.1529/biophysj.104.042044] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2004] [Accepted: 09/07/2004] [Indexed: 11/18/2022] Open
Abstract
Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.
Collapse
Affiliation(s)
- Alfonso Jaramillo
- Service de Conformation de Macromolécules Biologiques et Bioinformatique, CP263 Université Libre de Bruxelles, Brussels, Belgium
| | | |
Collapse
|
33
|
Loose C, Klepeis JL, Floudas CA. A new pairwise folding potential based on improved decoy generation and side-chain packing. Proteins 2004; 54:303-14. [PMID: 14696192 DOI: 10.1002/prot.10521] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A new force field for pairwise residue interactions as a function of C(alpha) to C(alpha) distances is presented. The force field was developed through the solution of a linear programming formulation with large sets of constraints. The constraints are based on the construction of >80,000 low-energy decoys for a set of proteins and requiring the decoy energies for each protein system to be higher than the native conformation of that particular protein. The generation of a robust force field was facilitated by the use of a novel decoy generation process, which involved the rational selection of proteins to add to the training set and included a significant energy minimization of the decoys. The force field was tested on a large set of decoys for various proteins not included in the training set and shown to perform well compared with a leading force field in identifying the native conformation for these proteins.
Collapse
Affiliation(s)
- C Loose
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08540, USA
| | | | | |
Collapse
|
34
|
Larson SM, Pande VS. Sequence optimization for native state stability determines the evolution and folding kinetics of a small protein. J Mol Biol 2003; 332:275-86. [PMID: 12946364 DOI: 10.1016/s0022-2836(03)00832-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Investigating the relative importance of protein stability, function, and folding kinetics in driving protein evolution has long been hindered by the fact that we can only compare modern natural proteins, the products of the very process we seek to understand, to each other, with no external references or baselines. Through a large-scale all-atom simulation of protein evolution, we have created a large diverse alignment of SH3 domain sequences which have been selected only for native state stability, with no other influencing factors. Although the average pairwise identity between computationally evolved and natural sequences is only 17%, the residue frequency distributions of the computationally evolved sequences are similar to natural SH3 sequences at 86% of the positions in the domain, suggesting that optimization for the native state structure has dominated the evolution of natural SH3 domains. Additionally, the positions which play a consistent role in the transition state of three well-characterized SH3 domains (by phi-value analysis) are structurally optimized for the native state, and vice versa. Indeed, we see a specific and significant correlation between sequence optimization for native state stability and conservation of transition state structure.
Collapse
Affiliation(s)
- Stefan M Larson
- Department of Chemistry and Biophysics Program, Stanford University, Stanford, CA 94305-5080, USA
| | | |
Collapse
|
35
|
Kuchanov SI, Khokhlov AR. Copolymers with designed proteinlike sequences obtained by polymeranalogous transformations of homopolymer globules. J Chem Phys 2003. [DOI: 10.1063/1.1543168] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
|
36
|
Larson SM, England JL, Desjarlais JR, Pande VS. Thoroughly sampling sequence space: large-scale protein design of structural ensembles. Protein Sci 2002; 11:2804-13. [PMID: 12441379 PMCID: PMC2373757 DOI: 10.1110/ps.0203902] [Citation(s) in RCA: 82] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2002] [Revised: 08/16/2002] [Accepted: 09/04/2002] [Indexed: 10/27/2022]
Abstract
Modeling the inherent flexibility of the protein backbone as part of computational protein design is necessary to capture the behavior of real proteins and is a prerequisite for the accurate exploration of protein sequence space. We present the results of a broad exploration of sequence space, with backbone flexibility, through a novel approach: large-scale protein design to structural ensembles. A distributed computing architecture has allowed us to generate hundreds of thousands of diverse sequences for a set of 253 naturally occurring proteins, allowing exciting insights into the nature of protein sequence space. Designing to a structural ensemble produces a much greater diversity of sequences than previous studies have reported, and homology searches using profiles derived from the designed sequences against the Protein Data Bank show that the relevance and quality of the sequences is not diminished. The designed sequences have greater overall diversity than corresponding natural sequence alignments, and no direct correlations are seen between the diversity of natural sequence alignments and the diversity of the corresponding designed sequences. For structures in the same fold, the sequence entropies of the designed sequences cluster together tightly. This tight clustering of sequence entropies within a fold and the separation of sequence entropy distributions for different folds suggest that the diversity of designed sequences is primarily determined by a structure's overall fold, and that the designability principle postulated from studies of simple models holds in real proteins. This has important implications for experimental protein design and engineering, as well as providing insight into protein evolution.
Collapse
Affiliation(s)
- Stefan M Larson
- Chemistry Department and Biophysics Program, Stanford University, California 94305, USA
| | | | | | | |
Collapse
|
37
|
Abstract
It is commonly believed that similarities between the sequences of two proteins infer similarities between their structures. Sequence alignments reliably recognize pairs of protein of similar structures provided that the percentage sequence identity between their two sequences is sufficiently high. This distinction, however, is statistically less reliable when the percentage sequence identity is lower than 30% and little is known then about the detailed relationship between the two measures of similarity. Here, we investigate the inverse correlation between structural similarity and sequence similarity on 12 protein structure families. We define the structure similarity between two proteins as the cRMS distance between their structures. The sequence similarity for a pair of proteins is measured as the mean distance between the sequences in the subsets of sequence space compatible with their structures. We obtain an approximation of the sequence space compatible with a protein by designing a collection of protein sequences both stable and specific to the structure of that protein. Using these measures of sequence and structure similarities, we find that structural changes within a protein family are linearly related to changes in sequence similarity.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
38
|
Jaramillo A, Wernisch L, Héry S, Wodak SJ. Folding free energy function selects native-like protein sequences in the core but not on the surface. Proc Natl Acad Sci U S A 2002; 99:13554-9. [PMID: 12368470 PMCID: PMC129712 DOI: 10.1073/pnas.212068599] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
An automatic protein design procedure is used to select amino acid sequences that optimize the folding free energy function for a given protein. The only information used in designing the sequences is a set of known backbone structures for each protein, a rotamer library, and a well established classical empirical force field, which relies on basic physical chemical principles that underlie molecular interactions and protein stability, and has not been adjusted to yield native-like sequences. Applying the procedure to 7 different known protein folds, representing a total of 45 different native protein structures, yields ensembles of designed sequences displaying remarkable similarity to their natural counterparts in the protein core, but which are distinctly non-native on the protein surface. We show that natural and designed sequences for a given fold score significantly higher than random sequences against profiles derived from both, designed and natural sequence ensembles. Furthermore, we find that designed sequence profiles can be used to retrieve the native sequences for many of the analyzed proteins using standard PSI-BLAST searches in sequence databases. These findings may have important implications for our understanding the selection pressures operating on natural protein sequences and hold promise for improving fold recognition.
Collapse
Affiliation(s)
- Alfonso Jaramillo
- Unité de Conformation de Macromolécules Biologiques, CP160/16, Université Libre de Bruxelles, 50 Avenue F. D. Roosevelt, 1050 Brussels, Belgium
| | | | | | | |
Collapse
|
39
|
Xia Y, Levitt M. Roles of mutation and recombination in the evolution of protein thermodynamics. Proc Natl Acad Sci U S A 2002; 99:10382-7. [PMID: 12149452 PMCID: PMC124923 DOI: 10.1073/pnas.162097799] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2002] [Indexed: 11/18/2022] Open
Abstract
We present a comprehensive study of the evolutionary origin of the thermodynamic behavior of proteins. With the use of a simplified model, we exhaustively enumerate the space of all sequences and the space of all structures, simulate the evolutionary relationship between sequences and structures, and characterize the steady-state sequence distribution for all structures in terms of several thermodynamic variables. We assess the effects of two major forces of evolution: mutation and recombination. Three simplifications are made. First, a two-dimensional lattice model is used to represent protein sequences and structures. Second, proteins undergo neutral evolution so that the fitness landscape has a flat allowed region inside of which all sequences are equally fit. Third, we ignore otherwise important factors such as finite population size and evolutionary time. Two scenarios emerge from our study. The first occurs when evolution is dominated by mutation events. Even though the prototype sequence that is most mutationally robust is preferred by evolution, the preference is not strong enough to offset the huge size of sequence space. Most native sequences are located near the boundary of the fitness region and are marginally compatible with the native structure. The second scenario occurs when evolution is dominated by recombination events. Now evolutionary preference for prototype sequence is strong enough to overcome the size of sequence space so that most native sequences are located near the center of sequence-structure compatibility. We conclude that the relative frequency of mutation and recombination events is a major determinant of how optimal protein sequences are for their structures.
Collapse
Affiliation(s)
- Yu Xia
- Department of Structural Biology, Stanford University School of Medicine, Stanford, CA 94305, USA.
| | | |
Collapse
|
40
|
Koehl P, Levitt M. Protein topology and stability define the space of allowed sequences. Proc Natl Acad Sci U S A 2002; 99:1280-5. [PMID: 11805293 PMCID: PMC122181 DOI: 10.1073/pnas.032405199] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We describe a new approach to explore and quantify the sequence space associated with a given protein structure. A set of sequences are optimized for a given target structure, using all-atom models and a physical energy function. Specificity of the sequence for its target is ensured by using the random energy model, which keeps the amino acid composition of the sequence constant. The designed sequences provide a multiple sequence alignment that describes the sequence space compatible with the structure of interest; here the size of this space is estimated by using an information entropy measure. In parallel, multiple alignments of naturally occurring sequences can be derived by using either sequence or structure alignments. We compared these 3 independent multiple sequence alignments for 10 different proteins, ranging in size from 56 to 310 residues. We observed that the subset of the sequence space derived by using our design procedure is similar in size to the sequence spaces observed in nature. These results suggest that the volume of sequence space compatible with a given protein fold is defined by the length of the protein as well as by the topology (i.e., geometry of the polypeptide chain) and the stability (i.e., free energy of denaturation) of the fold.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Structural Biology, Fairchild Building, D109, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
41
|
Koehl P, Levitt M. Improved recognition of native-like protein structures using a family of designed sequences. Proc Natl Acad Sci U S A 2002; 99:691-6. [PMID: 11782533 PMCID: PMC117367 DOI: 10.1073/pnas.022408799] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2001] [Indexed: 11/18/2022] Open
Abstract
The goal of the inverse protein folding problem is to identify amino acid sequences that stabilize a given target protein conformation. Methods that attempt to solve this problem have proven useful for protein sequence design. Here we show that the same methods can provide valuable information for protein fold recognition and for ab initio protein structure prediction. We present a measure of the compatibility of a test sequence with a target model structure, based on computational protein design. The model structure is used as input to design a family of low free energy sequences, and these sequences are compared with the test sequence by using a metric in sequence space based on nearest-neighbor connectivity. We find that this measure is able to recognize the native fold of a myoglobin sequence among different globin folds. It is also powerful enough to recognize near-native protein structures among non-native models.
Collapse
Affiliation(s)
- Patrice Koehl
- Department of Structural Biology, Fairchild Building, Stanford University, Stanford, CA 94305, USA.
| | | |
Collapse
|
42
|
Abstract
We have developed a new method for the prediction of peptide sequences that bind to a protein, given a three-dimensional structure of the protein in complex with a peptide. By applying a recently developed sequence prediction algorithm and a novel ensemble averaging calculation, we generate a diverse collection of peptide sequences that are predicted to have significant affinity for the protein. Using output from the simulations, we create position-specific scoring matrices, or virtual interaction profiles (VIPs). Comparison of VIPs for a collection of binding motifs to sequences determined experimentally indicates that the prediction algorithm is accurate and applicable to a diverse range of structures. With these VIPs, one can scan protein sequence databases rapidly to seek binding partners of potential biological significance. Overall, this method can significantly enhance the information contained within a protein- peptide crystal structure, and enrich the data obtained by experimental selection methods such as phage display.
Collapse
Affiliation(s)
- A M Wollacott
- Department of Chemistry, Pennsylvania State University, 406 Chandlee Laboratory, PA 16802, USA
| | | |
Collapse
|
43
|
Ma B, Wolfson HJ, Nussinov R. Protein functional epitopes: hot spots, dynamics and combinatorial libraries. Curr Opin Struct Biol 2001; 11:364-9. [PMID: 11406388 DOI: 10.1016/s0959-440x(00)00216-5] [Citation(s) in RCA: 90] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Recent studies increasingly point to the importance of structural flexibility and plasticity in proteins, highlighting the evolutionary advantage. There are an increasing number of cases in which given, presumably specific, binding sites have been shown to bind a range of ligands with different compositions and shapes. These studies have also revealed that evolution tends to find convergent solutions for stable intermolecular associations, largely via conservation of polar residues as hot spots of binding energy. On the other hand, the ability to bind multiple ligands at a given site is largely derived from hinge-based motions. The consideration of these two factors in functional epitopes allows more realism and robustness in the description of protein binding surfaces and, as such, in applications to mutants, modeled structures and design. Efficient multiple structure comparison and hinge-bending structure comparison tools enable the construction of combinatorial binding epitope libraries.
Collapse
Affiliation(s)
- B Ma
- Laboratory of Experimental and Computational Biology, National Cancer Institute-Frederick, Building 469, Room 151, Frederick, MD 21702, USA
| | | | | |
Collapse
|
44
|
Abstract
Protein design has become a powerful approach for understanding the relationship between amino acid sequence and 3-dimensional structure. In the past 5 years, there have been many breakthroughs in the development of computational methods that allow the selection of novel sequences given the structure of a protein backbone. Successful design of protein scaffolds has now paved the way for new endeavors to design function. The ability to design sequences compatible with a fold may also be useful in structural and functional genomics by expanding the range of proteins used for fold recognition and for the identification of functionally important domains from multiple sequence alignments.
Collapse
Affiliation(s)
- N Pokala
- Department of Molecular and Cell Biology, University of California, 229 Stanley Hall, Berkeley, California 94720, USA
| | | |
Collapse
|
45
|
Abstract
The aim of this work was to study the relationship between structure conservation and sequence divergence in protein evolution. To this end, we developed a model of structurally constrained protein evolution (SCPE) in which trial sequences, generated by random mutations at gene level, are selected against departure from a reference three-dimensional structure. Since at the mutational level SCPE is completely unbiased, any emergent sequence pattern will be due exclusively to structural constraints. In this first report, it is shown that SCPE correctly predicts the characteristic hexapeptide motif of the left-handed parallel beta helix (LbetaH) domain of UDP-N-acetylglucosamine acyltransferases (LpxA).
Collapse
Affiliation(s)
- G Parisi
- Universidad Nacional de Quilmes, Bernal, Argentina
| | | |
Collapse
|
46
|
Looger LL, Hellinga HW. Generalized dead-end elimination algorithms make large-scale protein side-chain structure prediction tractable: implications for protein design and structural genomics. J Mol Biol 2001; 307:429-45. [PMID: 11243829 DOI: 10.1006/jmbi.2000.4424] [Citation(s) in RCA: 155] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
The dead-end elimination (DEE) theorems are powerful tools for the combinatorial optimization of protein side-chain placement in protein design and homology modeling. In order to reach their full potential, the theorems must be extended to handle very hard problems. We present a suite of new algorithms within the DEE paradigm that significantly extend its range of convergence and reduce run time. As a demonstration, we show that a total protein design problem of 10(115) combinations, a hydrophobic core design problem of 10(244) combinations, and a side-chain placement problem of 10(1044) combinations are solved in less than two weeks, a day and a half, and an hour of CPU time, respectively. This extends the range of the method by approximately 53, 144 and 851 log-units, respectively, using modest computational resources. Small to average-sized protein domains can now be designed automatically, and side-chain placement calculations can be solved for nearly all sizes of proteins and protein complexes in the growing field of structural genomics.
Collapse
Affiliation(s)
- L L Looger
- Department of Biochemistry, Duke University Medical Center, Box 3711, Durham, NC 27710, USA
| | | |
Collapse
|
47
|
Kono H, Saven JG. Statistical theory for protein combinatorial libraries. Packing interactions, backbone flexibility, and the sequence variability of a main-chain structure. J Mol Biol 2001; 306:607-28. [PMID: 11178917 DOI: 10.1006/jmbi.2000.4422] [Citation(s) in RCA: 107] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Combinatorial experiments provide new ways to probe the determinants of protein folding and to identify novel folding amino acid sequences. These types of experiments, however, are complicated both by enormous conformational complexity and by large numbers of possible sequences. Therefore, a quantitative computational theory would be helpful in designing and interpreting these types of experiment. Here, we present and apply a statistically based, computational approach for identifying the properties of sequences compatible with a given main-chain structure. Protein side-chain conformations are included in an atom-based fashion. Calculations are performed for a variety of similar backbone structures to identify sequence properties that are robust with respect to minor changes in main-chain structure. Rather than specific sequences, the method yields the likelihood of each of the amino acids at preselected positions in a given protein structure. The theory may be used to quantify the characteristics of sequence space for a chosen structure without explicitly tabulating sequences. To account for hydrophobic effects, we introduce an environmental energy that it is consistent with other simple hydrophobicity scales and show that it is effective for side-chain modeling. We apply the method to calculate the identity probabilities of selected positions of the immunoglobulin light chain-binding domain of protein L, for which many variant folding sequences are available. The calculations compare favorably with the experimentally observed identity probabilities.
Collapse
Affiliation(s)
- H Kono
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | |
Collapse
|
48
|
Kuhlman B, Baker D. Native protein sequences are close to optimal for their structures. Proc Natl Acad Sci U S A 2000; 97:10383-8. [PMID: 10984534 PMCID: PMC27033 DOI: 10.1073/pnas.97.19.10383] [Citation(s) in RCA: 628] [Impact Index Per Article: 26.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
How large is the volume of sequence space that is compatible with a given protein structure? Starting from random sequences, low free energy sequences were generated for 108 protein backbone structures by using a Monte Carlo optimization procedure and a free energy function based primarily on Lennard-Jones packing interactions and the Lazaridis-Karplus implicit solvation model. Remarkably, in the designed sequences 51% of the core residues and 27% of all residues were identical to the amino acids in the corresponding positions in the native sequences. The lowest free energy sequences obtained for ensembles of native-like backbone structures were also similar to the native sequence. Furthermore, both the individual residue frequencies and the covariances between pairs of positions observed in the very large SH3 domain family were recapitulated in core sequences designed for SH3 domain structures. Taken together, these results suggest that the volume of sequence space optimal for a protein structure is surprisingly restricted to a region around the native sequence.
Collapse
Affiliation(s)
- B Kuhlman
- Department of Biochemistry and Howard Hughes Medical Institute, University of Washington School of Medicine, Seattle, WA 98195, USA
| | | |
Collapse
|
49
|
Wernisch L, Hery S, Wodak SJ. Automatic protein design with all atom force-fields by exact and heuristic optimization. J Mol Biol 2000; 301:713-36. [PMID: 10966779 DOI: 10.1006/jmbi.2000.3984] [Citation(s) in RCA: 108] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
A fully automatic procedure for predicting the amino acid sequences compatible with a given target structure is described. It is based on the CHARMM package, and uses an all atom force-field and rotamer libraries to describe and evaluate side-chain types and conformations. Sequences are ranked by a quantity akin to the free energy of folding, which incorporates hydration effects. Exact (Branch and Bound) and heuristic optimisation procedures are used to identifying highly scoring sequences from an astronomical number of possibilities. These sequences include the minimum free energy sequence, as well as all amino acid sequences whose free energy lies within a specified window from the minimum. Several applications of our procedure are illustrated. Prediction of side-chain conformations for a set of ten proteins yields results comparable to those of established side-chain placement programs. Applications to sequence optimisation comprise the re-design of the protein cores of c-Crk SH3 domain, the B1 domain of protein G and Ubiquitin, and of surface residues of the SH3 domain. In all calculations, no restrictions are imposed on the amino acid composition and identical parameter settings are used for core and surface residues. The best scoring sequences for the protein cores are virtually identical to wild-type. They feature no more than one to three mutations in a total of 11-16 variable positions. Tests suggest that this is due to the balance between various contributions in the force-field rather than to overwhelming influence from packing constraints. The effectiveness of our force-field is further supported by the sequence predictions for surface residues of the SH3 domain. More mutations are predicted than in the core, seemingly in order to optimise the network of complementary interactions between polar and charged groups. This appears to be an important energetic requirement in absence of the partner molecules with which the SH3 domain interacts, which were not included in the calculations. Finally, a detailed comparison between the sequences generated by the heuristic and exact optimisation algorithms, commends a note of caution concerning the efficiency of heuristic procedures in exploring sequence space.
Collapse
Affiliation(s)
- L Wernisch
- European Bioinformatics Institute, Hinxton, CB10 1SD, England
| | | | | |
Collapse
|
50
|
Raha K, Wollacott AM, Italia MJ, Desjarlais JR. Prediction of amino acid sequence from structure. Protein Sci 2000; 9:1106-19. [PMID: 10892804 PMCID: PMC2144664 DOI: 10.1110/ps.9.6.1106] [Citation(s) in RCA: 61] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
We have developed a method for the prediction of an amino acid sequence that is compatible with a three-dimensional backbone structure. Using only a backbone structure of a protein as input, the algorithm is capable of designing sequences that closely resemble natural members of the protein family to which the template structure belongs. In general, the predicted sequences are shown to have multiple sequence profile scores that are dramatically higher than those of random sequences, and sometimes better than some of the natural sequences that make up the superfamily. As anticipated, highly conserved but poorly predicted residues are often those that contribute to the functional rather than structural properties of the protein. Overall, our analysis suggests that statistical profile scores of designed sequences are a novel and valuable figure of merit for assessing and improving protein design algorithms.
Collapse
Affiliation(s)
- K Raha
- Integrative Biosciences Program, Pennsylvania State University, University Park, Pennsylvania 16803, USA
| | | | | | | |
Collapse
|