551
|
Chakrabarti R, Klibanov AM, Friesner RA. Sequence optimization and designability of enzyme active sites. Proc Natl Acad Sci U S A 2005; 102:12035-40. [PMID: 16103370 PMCID: PMC1189337 DOI: 10.1073/pnas.0505397102] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2005] [Indexed: 11/18/2022] Open
Abstract
We recently found that many residues in enzyme active sites can be computationally predicted by the optimization of scoring functions based on substrate binding affinity, subject to constraints on the geometry of catalytic residues and protein stability. Here, we explore the generality of this surprising observation. First, the impact of hydrogen-bonding networks necessary for catalysis on the accuracy of sequence optimization is assessed; incorporation of these networks, where relevant, into the set of catalytic constraints is found to be essential. Next, the impact of multiple substrate selectivity on sequence optimization is probed by carrying out independent calculations for complexes of deoxyribonucleoside kinases with various cognate ligands, revealing how simultaneous selection pressures determined active-site sequences of these enzymes. Including previous calculations on simpler enzymes, computational sequence optimization correctly predicts 76% of all active-site residues tested (86% correct, with 93% similar, for naturally conserved residues). In these studies, the ligand is fixed in its native conformation. To assess the applicability of these methods to de novo active-site design, the effect of small ligand motions around the native pose is also examined. Robustness of sequence accuracy for topologically similar poses is demonstrated for selected kinases, but not for a model peptidase. Based on these observations, we introduce the notion of the designability of an enzyme active site, a metric that may be used to guide the search for protein scaffolds suitable for the introduction of de novo activity for a desired chemical reaction.
Collapse
Affiliation(s)
- Raj Chakrabarti
- Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York, NY 10027, USA
| | | | | |
Collapse
|
552
|
Chakrabarti R, Klibanov AM, Friesner RA. Computational prediction of native protein ligand-binding and enzyme active site sequences. Proc Natl Acad Sci U S A 2005; 102:10153-8. [PMID: 15998733 PMCID: PMC1177389 DOI: 10.1073/pnas.0504023102] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Recent studies reveal that the core sequences of many proteins were nearly optimized for stability by natural evolution. Surface residues, by contrast, are not so optimized, presumably because protein function is mediated through surface interactions with other molecules. Here, we sought to determine the extent to which the sequences of protein ligand-binding and enzyme active sites could be predicted by optimization of scoring functions based on protein ligand-binding affinity rather than structural stability. Optimization of binding affinity under constraints on the folding free energy correctly predicted 83% of amino acid residues (94% similar) in the binding sites of two model receptor-ligand complexes, streptavidin-biotin and glucose-binding protein. To explore the applicability of this methodology to enzymes, we applied an identical algorithm to the active sites of diverse enzymes from the peptidase, beta-gal, and nucleotide synthase families. Although simple optimization of binding affinity reproduced the sequences of some enzyme active sites with high precision, imposition of additional, geometric constraints on side-chain conformations based on the catalytic mechanism was required in other cases. With these modifications, our sequence optimization algorithm correctly predicted 78% of residues from all of the enzymes, with 83% similar to native (90% correct, with 95% similar, excluding residues with high variability in multiple sequence alignments). Furthermore, the conformations of the selected side chains were often correctly predicted within crystallographic error. These findings suggest that simple selection pressures may have played a predominant role in determining the sequences of ligand-binding and active sites in proteins.
Collapse
Affiliation(s)
- Raj Chakrabarti
- Department of Chemistry and Center for Biomolecular Simulation, Columbia University, New York, NY 10027, USA
| | | | | |
Collapse
|
553
|
Kovalenko OV, Metcalf DG, DeGrado WF, Hemler ME. Structural organization and interactions of transmembrane domains in tetraspanin proteins. BMC STRUCTURAL BIOLOGY 2005; 5:11. [PMID: 15985154 PMCID: PMC1190194 DOI: 10.1186/1472-6807-5-11] [Citation(s) in RCA: 82] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/29/2005] [Accepted: 06/28/2005] [Indexed: 11/22/2022]
Abstract
Background Proteins of the tetraspanin family contain four transmembrane domains (TM1-4) linked by two extracellular loops and a short intracellular loop, and have short intracellular N- and C-termini. While structure and function analysis of the larger extracellular loop has been performed, the organization and role of transmembrane domains have not been systematically assessed. Results Among 28 human tetraspanin proteins, the TM1-3 sequences display a distinct heptad repeat motif (abcdefg)n. In TM1, position a is occupied by structurally conserved bulky residues and position d contains highly conserved Asn and Gly residues. In TM2, position a is occupied by conserved small residues (Gly/Ala/Thr), and position d has a conserved Gly and two bulky aliphatic residues. In TM3, three a positions of the heptad repeat are filled by two leucines and a glutamate/glutamine residue, and two d positions are occupied by either Phe/Tyr or Val/Ile/Leu residues. No heptad motif is apparent in TM4 sequences. Mutations of conserved glycines in human CD9 (Gly25 and Gly32 in TM1; Gly67 and Gly74 in TM2) caused aggregation of mutant proteins inside the cell. Modeling of the TM1-TM2 interface in CD9, using a novel algorithm, predicts tight packing of conserved bulky residues against conserved Gly residues along the two helices. The homodimeric interface of CD9 was mapped, by disulfide cross-linking of single-cysteine mutants, to the vicinity of residues Leu14 and Phe17 in TM1 (positions g and c) and Gly77, Gly80 and Ala81 in TM2 (positions d, g and a, respectively). Mutations of a and d residues in both TM1 and TM2 (Gly25, Gly32, Gly67 and Gly74), involved in intramolecular TM1-TM2 interaction, also strongly diminished intermolecular interaction, as assessed by cross-linking of Cys80. Conclusion Our results suggest that tetraspanin intra- and intermolecular interactions are mediated by conserved residues in adjacent, but distinct regions of TM1 and TM2. A key structural element that defines TM1-TM2 interaction in tetraspanins is the specific packing of bulky residues against small residues.
Collapse
Affiliation(s)
- Oleg V Kovalenko
- Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute and Department of Pathology, Harvard Medical School, Boston, USA
| | - Douglas G Metcalf
- Department of Biochemistry and Biophysics, School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - William F DeGrado
- Department of Biochemistry and Biophysics, School of Medicine, University of Pennsylvania, Philadelphia, USA
| | - Martin E Hemler
- Department of Cancer Immunology and AIDS, Dana-Farber Cancer Institute and Department of Pathology, Harvard Medical School, Boston, USA
- Dana-Farber Cancer Institute, D-1430, 44 Binney Street, Boston, MA 02115, USA
| |
Collapse
|
554
|
Ma XH, Li CH, Shen LZ, Gong XQ, Chen WZ, Wang CX. Biologically enhanced sampling geometric docking and backbone flexibility treatment with multiconformational superposition. Proteins 2005; 60:319-23. [PMID: 15981260 DOI: 10.1002/prot.20577] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
An efficient biologically enhanced sampling geometric docking method is presented based on the FTDock algorithm to predict the protein-protein binding modes. The active site data from different sources, such as biochemical and biophysical experiments or theoretical analyses of sequence data, can be incorporated in the rotation-translation scan. When discretizing a protein onto a 3-dimensional (3D) grid, a zero value is given to grid points outside a sphere centered on the geometric center of specified residues. In this way, docking solutions are biased toward modes where the interface region is inside the sphere. We also adopt a multiconformational superposition scheme to represent backbone flexibility in the proteins. When these procedures were applied to the targets of CAPRI, a larger number of hits and smaller ligand root-mean-square deviations (RMSDs) were obtained at the conformational search stage in all cases, and especially Target 19. With Target 18, only 1 near-native structure was retained by the biologically enhanced sampling geometric docking method, but this number increased to 53 and the least ligand RMSD decreased from 8.1 A to 2.9 A after performing multiconformational superposition. These results were obtained after the CAPRI prediction deadlines.
Collapse
Affiliation(s)
- Xiao Hui Ma
- College of Life Science and Bioengineering, Beijing University of Technology, Beijing, China
| | | | | | | | | | | |
Collapse
|
555
|
Abstract
RosettaNMR combines the Rosetta de novo structure prediction method with limited NMR experimental data for rapid estimation of protein structure. The de novo Rosetta algorithm predicts protein three-dimensional structures using only sequence information by combining short fragments selected from known protein structures on the basis of local sequence similarity. These fragments are assembled using a Monte Carlo strategy to generate models that reproduce empirical statistics describing nonlocal protein structure such as overall compactness, hydrophobic burial, and beta-strand pairing. By incorporating chemical shift, nuclear Overhauser enhancement, and?or residual dipolar coupling restraints that are insufficient on their own to determine the protein global fold, the RosettaNMR method correctly estimates the global fold of a variety of different proteins, generating models that are that are generally 4?A or better Calpha root-mean-square deviation to the high-resolution experimental structures. Here we review the capabilities of the RosettaNMR approach, describe the underlying methods, and provide practical tips for applying the technique to structure estimation problems.
Collapse
Affiliation(s)
- Carol A Rohl
- Department of Biomolecular Engineering, University of California, Santa Cruz 95064, USA
| |
Collapse
|
556
|
Abstract
Naturally occurring proteins comprise a special subset of all plausible sequences and structures selected through evolution. Simulating protein evolution with simplified and all-atom models has shed light on the evolutionary dynamics of protein populations, the nature of evolved sequences and structures, and the extent to which today's proteins are shaped by selection pressures on folding, structure and function. Extensive mapping of the native structure, stability and folding rate in sequence space using lattice proteins has revealed organizational principles of the sequence/structure map important for evolutionary dynamics. Evolutionary simulations with lattice proteins have highlighted the importance of fitness landscapes, evolutionary mechanisms, population dynamics and sequence space entropy in shaping the generic properties of proteins. Finally, evolutionary-like simulations with all-atom models, in particular computational protein design, have helped identify the dominant selection pressures on naturally occurring protein sequences and structures.
Collapse
Affiliation(s)
- Yu Xia
- Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, New Haven, CT 06520, USA
| | | |
Collapse
|
557
|
Abstract
Thermostabilizing an enzyme while maintaining its activity for industrial or biomedical applications can be difficult with traditional selection methods. We describe a rapid computational approach that identified three mutations within a model enzyme that produced a 10 degrees C increase in apparent melting temperature T(m) and a 30-fold increase in half-life at 50 degrees C, with no reduction in catalytic efficiency. The effects of the mutations were synergistic, giving an increase in excess of the sum of their individual effects. The redesigned enzyme induced an increased, temperature-dependent bacterial growth rate under conditions that required its activity, thereby coupling molecular and metabolic engineering.
Collapse
Affiliation(s)
- Aaron Korkegian
- Division of Basic Sciences, Fred Hutchinson Cancer Research Center (FHCRC), 1100 Fairview Avenue North, Seattle, WA 98109, USA
| | | | | | | |
Collapse
|
558
|
Abstract
Success in high-resolution protein-protein docking requires accurate modeling of side-chain conformations at the interface. Most current methods either leave side chains fixed in the conformations observed in the unbound protein structures or allow the side chains to sample a set of discrete rotamer conformations. Here we describe a rapid and efficient method for sampling off-rotamer side-chain conformations by torsion space minimization during protein-protein docking starting from discrete rotamer libraries supplemented with side-chain conformations taken from the unbound structures, and show that the new method improves side-chain modeling and increases the energetic discrimination between good and bad models. Analysis of the distribution of side-chain interaction energies within and between the two protein partners shows that the new method leads to more native-like distributions of interaction energies and that the neglect of side-chain entropy produces a small but measurable increase in the number of residues whose interaction energy cannot compensate for the entropic cost of side-chain freezing at the interface. The power of the method is highlighted by a number of predictions of unprecedented accuracy in the recent CAPRI (Critical Assessment of PRedicted Interactions) blind test of protein-protein docking methods.
Collapse
Affiliation(s)
- Chu Wang
- Department of Biochemistry, Box 357350, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
559
|
Pokala N, Handel TM. Energy Functions for Protein Design: Adjustment with Protein–Protein Complex Affinities, Models for the Unfolded State, and Negative Design of Solubility and Specificity. J Mol Biol 2005; 347:203-27. [PMID: 15733929 DOI: 10.1016/j.jmb.2004.12.019] [Citation(s) in RCA: 157] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2004] [Revised: 12/05/2004] [Accepted: 12/09/2004] [Indexed: 11/16/2022]
Abstract
The development of the EGAD program and energy function for protein design is described. In contrast to most protein design methods, which require several empirical parameters or heuristics such as patterning of residues or rotamers, EGAD has a minimalist philosophy; it uses very few empirical factors to account for inaccuracies resulting from the use of fixed backbones and discrete rotamers in protein design calculations, and describes the unfolded state, aggregates, and alternative conformers explicitly with physical models instead of fitted parameters. This approach unveils important issues in protein design that are often camouflaged by heuristic-emphasizing methods. Inter-atom energies are modeled with the OPLS-AA all-atom forcefield, electrostatics with the generalized Born continuum model, and the hydrophobic effect with a solvent-accessible surface area-dependent term. Experimental characterization of proteins designed with an unmodified version of the energy function revealed problems with under-packing, stability, aggregation, and structural specificity. Under-packing was addressed by modifying the van der Waals function. By optimizing only three parameters, the effects of >400 mutations on protein-protein complex formation were predicted to within 1.0 kcal mol(-1). As an independent test, this modified energy function was used to predict the stabilities of >1500 mutants to within 1.0 kcal mol(-1); this required a physical model of the unfolded state that includes more interactions than traditional tripeptide-based models. Solubility and structural specificity were addressed with simple physical approximations of aggregation and conformational equilibria. The complete energy function can design protein sequences that have high levels of identity with their natural counterparts, and have predicted structural properties more consistent with soluble and uniquely folded proteins than the initial designs.
Collapse
Affiliation(s)
- Navin Pokala
- Department of Molecular and Cell Biology, University of California, Berkeley, Berkeley, CA 94720, USA.
| | | |
Collapse
|
560
|
Misura KMS, Baker D. Progress and challenges in high-resolution refinement of protein structure models. Proteins 2005; 59:15-29. [PMID: 15690346 DOI: 10.1002/prot.20376] [Citation(s) in RCA: 132] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Achieving atomic level accuracy in de novo structure prediction presents a formidable challenge even in the context of protein models with correct topologies. High-resolution refinement is a fundamental test of force field accuracy and sampling methodology, and its limited success in both comparative modeling and de novo prediction contexts highlights the limitations of current approaches. We constructed four tests to identify bottlenecks in our current approach and to guide progress in this challenging area. The first three tests showed that idealized native structures are stable under our refinement simulation conditions and that the refinement protocol can significantly decrease the root mean square deviation (RMSD) of perturbed native structures. In the fourth test we applied the refinement protocol to de novo models and showed that accurate models could be identified based on their energies, and in several cases many of the buried side chains adopted native-like conformations. We also showed that the differences in backbone and side-chain conformations between the refined de novo models and the native structures are largely localized to loop regions and regions where the native structure has unusual features such as rare rotamers or atypical hydrogen bonding between beta-strands. The refined de novo models typically have higher energies than refined idealized native structures, indicating that sampling of local backbone conformations and side-chain packing arrangements in a condensed state is a primary obstacle.
Collapse
Affiliation(s)
- Kira M S Misura
- Department of Biochemistry, University of Washington Health Sciences, Seattle, Washington 98195-7350, USA
| | | |
Collapse
|
561
|
Park S, Kono H, Wang W, Boder ET, Saven JG. Progress in the development and application of computational methods for probabilistic protein design. Comput Chem Eng 2005. [DOI: 10.1016/j.compchemeng.2004.07.037] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
|
562
|
Stouffer AL, Nanda V, Lear JD, DeGrado WF. Sequence determinants of a transmembrane proton channel: an inverse relationship between stability and function. J Mol Biol 2005; 347:169-79. [PMID: 15733926 DOI: 10.1016/j.jmb.2005.01.023] [Citation(s) in RCA: 53] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2004] [Revised: 01/03/2005] [Accepted: 01/07/2005] [Indexed: 11/21/2022]
Abstract
The driving forces behind the folding processes of integral membrane proteins after insertion into the bilayer, is currently under debate. The M2 protein from the influenza A virus is an ideal system to study lateral association of transmembrane helices. Its proton selective channel is essential for virus functioning and a target of the drug amantadine. A 25 residue transmembrane fragment of M2, M2TM, forms a four-helix bundle in vivo and in various detergents and phospholipid bilayers. Presented here are the energetic consequences for mutations made to the helix/helix interfaces of the M2TM tetramer. Analytical ultracentrifugation has been used to determine the effect of ten single-site mutations, to either alanine or phenylalanine, on the oligomeric state and the free energy of M2TM in the absence and the presence of amantadine. It was expected that many of these mutations would perturb the M2TM stability and tetrameric integrity. Interestingly, none of the mutations destabilize tetramerization. This finding suggests that M2 sacrifices stability to preserve its functions, which require rapid and specific interchange between distinct conformations involved in gating and proton conduction. Mutations might therefore restrict the full range of conformations by stabilizing a given native or non-native conformational state. In order to assess one specific conformation of the tetramer, we measured the binding of amantadine to the resting state of the channel, and examined the overall free energy of assembly of the amantadine bound tetramer. All of the mutations destabilized amantadine binding or were isoenergetic. We also find that large to small residue changes destabilize the amantadine bound tetramer whereas mutations to side-chains of similar volume stabilize this conformation. A structural model of the amantadine bound state of M2TM was generated using a novel protocol that optimizes a structure for an ensemble of neutral and disruptive mutations. The model structure is consistent with the mutational data.
Collapse
Affiliation(s)
- Amanda L Stouffer
- Department of Biochemistry and Biophysics, School of Medicine, University of Pennsylvania, Philadelphia PA, 19104-6059, USA
| | | | | | | |
Collapse
|
563
|
Saunders CT, Baker D. Recapitulation of protein family divergence using flexible backbone protein design. J Mol Biol 2005; 346:631-44. [PMID: 15670610 DOI: 10.1016/j.jmb.2004.11.062] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2004] [Revised: 11/18/2004] [Accepted: 11/22/2004] [Indexed: 11/30/2022]
Abstract
We use flexible backbone protein design to explore the sequence and structure neighborhoods of naturally occurring proteins. The method samples sequence and structure space in the vicinity of a known sequence and structure by alternately optimizing the sequence for a fixed protein backbone using rotamer based sequence search, and optimizing the backbone for a fixed amino acid sequence using atomic-resolution structure prediction. We find that such a flexible backbone design method better recapitulates protein family sequence variation than sequence optimization on fixed backbones or randomly perturbed backbone ensembles for ten diverse protein structures. For the SH3 domain, the backbone structure variation in the family is also better recapitulated than in randomly perturbed backbones. The potential application of this method as a model of protein family evolution is highlighted by a concerted transition to the amino acid sequence in the structural core of one SH3 domain starting from the backbone coordinates of an homologous structure.
Collapse
Affiliation(s)
- Christopher T Saunders
- Department of Genome Sciences, University of Washington, Box 357730, Seattle, WA 98195, USA
| | | |
Collapse
|
564
|
Floudas CA. Research challenges, opportunities and synergism in systems engineering and computational biology. AIChE J 2005. [DOI: 10.1002/aic.10620] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
565
|
Yang X, Saven JG. Computational methods for protein design and protein sequence variability: biased Monte Carlo and replica exchange. Chem Phys Lett 2005. [DOI: 10.1016/j.cplett.2004.10.153] [Citation(s) in RCA: 30] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022]
|
566
|
Morozov AV, Kortemme T. Potential functions for hydrogen bonds in protein structure prediction and design. ADVANCES IN PROTEIN CHEMISTRY 2005; 72:1-38. [PMID: 16581371 DOI: 10.1016/s0065-3233(05)72001-5] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
Hydrogen bonds are an important contributor to free energies of biological macromolecules and macromolecular complexes, and hence an accurate description of these interactions is important for progress in biomolecular modeling. A simple description of the hydrogen bond is based on an electrostatic dipole-dipole interaction involving hydrogen-donor and acceptor-acceptor base dipoles, but the physical nature of hydrogen bond formation is more complex. At the most fundamental level, hydrogen bonding is a quantum mechanical phenomenon with contributions from covalent effects, polarization, and charge transfer. Recent experiments and theoretical calculations suggest that both electrostatic and covalent components determine the properties of hydrogen bonds. Likely, the level of rigor required to describe hydrogen bonding will depend on the problem posed. Current approaches to modeling hydrogen bonds include knowledge-based descriptions based on surveys of hydrogen bond geometries in structural databases of proteins and small molecules, empirical molecular mechanics models, and quantum mechanics-based electronic structure calculations. Ab initio calculations of hydrogen bonding energies and geometries accurately reproduce energy landscapes obtained from the distributions of hydrogen bond geometries observed in protein structures. Orientation-dependent hydrogen bonding potentials were found to improve the quality of protein structure prediction and refinement, protein-protein docking, and protein design.
Collapse
Affiliation(s)
- Alexandre V Morozov
- Center for Studies in Physics and Biology, Rockefeller University, New York, New York 10021
| | | |
Collapse
|
567
|
Chapter 18 Computationally Assisted Protein Design. ACTA ACUST UNITED AC 2005. [DOI: 10.1016/s1574-1400(05)01018-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
|
568
|
Jaramillo A, Wodak SJ. Computational protein design is a challenge for implicit solvation models. Biophys J 2005; 88:156-71. [PMID: 15377512 PMCID: PMC1304995 DOI: 10.1529/biophysj.104.042044] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2004] [Accepted: 09/07/2004] [Indexed: 11/18/2022] Open
Abstract
Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.
Collapse
Affiliation(s)
- Alfonso Jaramillo
- Service de Conformation de Macromolécules Biologiques et Bioinformatique, CP263 Université Libre de Bruxelles, Brussels, Belgium
| | | |
Collapse
|
569
|
Abstract
The relationship between monomer chirality and polymer structure has been studied using both theoretical and experimental methods. Atomistic models, such as the ones employed in computational protein folding and design, can be used to study the relationship between monomer chirality and the properties of polypeptides. Using a simulated evolution approach that combines side-chain epimerization with backbone flexibility, we recapitulate the relationship between basic forces that drive secondary structure formation and sequence homochirality. Additionally, we find heterochiral motifs including a C-terminal helix capping interaction and stable helix-reversals that result in bent helix structures. Our studies show that simulated evolution of chirality with backbone flexibility can be a powerful tool in the design of novel heteropolymers with tuned stereochemical properties.
Collapse
Affiliation(s)
- Vikas Nanda
- Department of Biochemistry and Molecular Biophysics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania 19104, USA.
| | | |
Collapse
|
570
|
Havranek JJ, Duarte CM, Baker D. A simple physical model for the prediction and design of protein-DNA interactions. J Mol Biol 2004; 344:59-70. [PMID: 15504402 DOI: 10.1016/j.jmb.2004.09.029] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2004] [Revised: 07/28/2004] [Accepted: 09/10/2004] [Indexed: 11/29/2022]
Abstract
Protein-DNA interactions are crucial for many biological processes. Attempts to model these interactions have generally taken the form of amino acid-base recognition codes or purely sequence-based profile methods, which depend on the availability of extensive sequence and structural information for specific structural families, neglect side-chain conformational variability, and lack generality beyond the structural family used to train the model. Here, we take advantage of recent advances in rotamer-based protein design and the large number of structurally characterized protein-DNA complexes to develop and parameterize a simple physical model for protein-DNA interactions. The model shows considerable promise for redesigning amino acids at protein-DNA interfaces, as design calculations recover the amino acid residue identities and conformations at these interfaces with accuracies comparable to sequence recovery in globular proteins. The model shows promise also for predicting DNA-binding specificity for fixed protein sequences: native DNA sequences are selected correctly from pools of competing DNA substrates; however, incorporation of backbone movement will likely be required to improve performance in homology modeling applications. Interestingly, optimization of zinc finger protein amino acid sequences for high-affinity binding to specific DNA sequences results in proteins with little or no predicted specificity, suggesting that naturally occurring DNA-binding proteins are optimized for specificity rather than affinity. When combined with algorithms that optimize specificity directly, the simple computational model developed here should be useful for the engineering of proteins with novel DNA-binding specificities.
Collapse
Affiliation(s)
- James J Havranek
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | | |
Collapse
|
571
|
Qian B, Ortiz AR, Baker D. Improvement of comparative model accuracy by free-energy optimization along principal components of natural structural variation. Proc Natl Acad Sci U S A 2004; 101:15346-51. [PMID: 15492216 PMCID: PMC524448 DOI: 10.1073/pnas.0404703101] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Accurate high-resolution refinement of protein structure models is a formidable challenge because of the delicate balance of forces in the native state, the difficulty in sampling the very large number of alternative tightly packed conformations, and the inaccuracies in current force fields. Indeed, energy-based refinement of comparative models generally leads to degradation rather than improvement in model quality, and, hence, most current comparative modeling procedures omit physically based refinement. However, despite their inaccuracies, current force fields do contain information that is orthogonal to the evolutionary information on which comparative models are based, and, hence, refinement might be able to improve comparative models if the space that is sampled is restricted sufficiently so that false attractors are avoided. Here, we use the principal components of the variation of backbone structures within a homologous family to define a small number of evolutionarily favored sampling directions and show that model quality can be improved by energy-based optimization along these directions.
Collapse
Affiliation(s)
- Bin Qian
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, J-567 Health Sciences, Box 357350, Seattle, WA 98105, USA
| | | | | |
Collapse
|
572
|
Chavez LL, Onuchic JN, Clementi C. Quantifying the roughness on the free energy landscape: entropic bottlenecks and protein folding rates. J Am Chem Soc 2004; 126:8426-32. [PMID: 15237999 DOI: 10.1021/ja049510+] [Citation(s) in RCA: 190] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The prediction of protein folding rates and mechanisms is currently of great interest in the protein folding community. A close comparison between theory and experiment in this area is promising to advance our understanding of the physical-chemical principles governing the folding process. The delicate interplay of entropic and energetic/enthalpic factors in the protein free energy regulates the details of this complex reaction. In this article, we propose the use of topological descriptors to quantify the amount of heterogeneity in the configurational entropy contribution to the free energy. We apply the procedure to a set of 16 two-state folding proteins. The results offer a clean and simple theoretical explanation for the experimentally measured folding rates and mechanisms, in terms of the intrinsic entropic roughness along the populated folding routes on the protein free energy landscape.
Collapse
Affiliation(s)
- Leslie L Chavez
- Center for Theoretical Biological Physics and Department of Physics, University of California at San Diego, La Jolla, California 92093, USA
| | | | | |
Collapse
|
573
|
Chen Y, Kortemme T, Robertson T, Baker D, Varani G. A new hydrogen-bonding potential for the design of protein-RNA interactions predicts specific contacts and discriminates decoys. Nucleic Acids Res 2004; 32:5147-62. [PMID: 15459285 PMCID: PMC521638 DOI: 10.1093/nar/gkh785] [Citation(s) in RCA: 62] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
RNA-binding proteins play many essential roles in the regulation of gene expression in the cell. Despite the significant increase in the number of structures for RNA-protein complexes in the last few years, the molecular basis of specificity remains unclear even for the best-studied protein families. We have developed a distance and orientation-dependent hydrogen-bonding potential based on the statistical analysis of hydrogen-bonding geometries that are observed in high-resolution crystal structures of protein-DNA and protein-RNA complexes. We observe very strong geometrical preferences that reflect significant energetic constraints on the relative placement of hydrogen-bonding atom pairs at protein-nucleic acid interfaces. A scoring function based on the hydrogen-bonding potential discriminates native protein-RNA structures from incorrectly docked decoys with remarkable predictive power. By incorporating the new hydrogen-bonding potential into a physical model of protein-RNA interfaces with full atom representation, we were able to recover native amino acids at protein-RNA interfaces.
Collapse
Affiliation(s)
- Yu Chen
- Department of Chemistry, University of Washington, Box 351700, Seattle, WA 98195-1700, USA
| | | | | | | | | |
Collapse
|
574
|
Lomize AL, Pogozheva ID, Mosberg HI. Quantification of helix-helix binding affinities in micelles and lipid bilayers. Protein Sci 2004; 13:2600-12. [PMID: 15340167 PMCID: PMC2286553 DOI: 10.1110/ps.04850804] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
A theoretical approach for estimating association free energies of alpha-helices in nonpolar media has been developed. The parameters of energy functions have been derived from DeltaDeltaG values of mutants in water-soluble proteins and partitioning of organic solutes between water and nonpolar solvents. The proposed approach was verified successfully against three sets of published data: (1) dissociation constants of alpha-helical oligomers formed by 27 hydrophobic peptides; (2) stabilities of 22 bacteriorhodopsin mutants, and (3) protein-ligand binding affinities in aqueous solution. It has been found that coalescence of helices is driven exclusively by van der Waals interactions and H-bonds, whereas the principal destabilizing contributions are represented by side-chain conformational entropy and transfer energy of atoms from a detergent or lipid to the protein interior. Electrostatic interactions of alpha-helices were relatively weak but important for reproducing the experimental data. Immobilization free energy, which originates from restricting rotational and translational rigid-body movements of molecules during their association, was found to be less than 1 kcal/mole. The energetics of amino acid substitutions in bacteriorhodopsin was complicated by specific binding of lipid and water molecules to cavities created in certain mutants.
Collapse
Affiliation(s)
- Andrei L Lomize
- College of Pharmacy, University of Michigan, 428 Church St., Ann Arbor, MI 48109-1065, USA.
| | | | | |
Collapse
|
575
|
Kim DE, Chivian D, Baker D. Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 2004; 32:W526-31. [PMID: 15215442 PMCID: PMC441606 DOI: 10.1093/nar/gkh468] [Citation(s) in RCA: 1367] [Impact Index Per Article: 68.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein-protein interactions using computational interface alanine scanning. The Rosetta protein design and protein-protein docking methodologies will soon be available through the server as well.
Collapse
Affiliation(s)
- David E Kim
- Structural Genomics of Pathogenic Protozoa, Department of Biochemistry, University of Washington, Seattle WA 98195, USA
| | | | | |
Collapse
|
576
|
Abstract
The Robetta server (http://robetta.bakerlab.org) provides automated tools for protein structure prediction and analysis. For structure prediction, sequences submitted to the server are parsed into putative domains and structural models are generated using either comparative modeling or de novo structure prediction methods. If a confident match to a protein of known structure is found using BLAST, PSI-BLAST, FFAS03 or 3D-Jury, it is used as a template for comparative modeling. If no match is found, structure predictions are made using the de novo Rosetta fragment insertion method. Experimental nuclear magnetic resonance (NMR) constraints data can also be submitted with a query sequence for RosettaNMR de novo structure determination. Other current capabilities include the prediction of the effects of mutations on protein-protein interactions using computational interface alanine scanning. The Rosetta protein design and protein-protein docking methodologies will soon be available through the server as well.
Collapse
Affiliation(s)
- David E Kim
- Structural Genomics of Pathogenic Protozoa, Department of Biochemistry, University of Washington, Seattle WA 98195, USA
| | | | | |
Collapse
|
577
|
Rohl CA, Strauss CEM, Chivian D, Baker D. Modeling structurally variable regions in homologous proteins with rosetta. Proteins 2004; 55:656-77. [PMID: 15103629 DOI: 10.1002/prot.10629] [Citation(s) in RCA: 242] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
A major limitation of current comparative modeling methods is the accuracy with which regions that are structurally divergent from homologues of known structure can be modeled. Because structural differences between homologous proteins are responsible for variations in protein function and specificity, the ability to model these differences has important functional consequences. Although existing methods can provide reasonably accurate models of short loop regions, modeling longer structurally divergent regions is an unsolved problem. Here we describe a method based on the de novo structure prediction algorithm, Rosetta, for predicting conformations of structurally divergent regions in comparative models. Initial conformations for short segments are selected from the protein structure database, whereas longer segments are built up by using three- and nine-residue fragments drawn from the database and combined by using the Rosetta algorithm. A gap closure term in the potential in combination with modified Newton's method for gradient descent minimization is used to ensure continuity of the peptide backbone. Conformations of variable regions are refined in the context of a fixed template structure using Monte Carlo minimization together with rapid repacking of side-chains to iteratively optimize backbone torsion angles and side-chain rotamers. For short loops, mean accuracies of 0.69, 1.45, and 3.62 A are obtained for 4, 8, and 12 residue loops, respectively. In addition, the method can provide reasonable models of conformations of longer protein segments: predicted conformations of 3A root-mean-square deviation or better were obtained for 5 of 10 examples of segments ranging from 13 to 34 residues. In combination with a sequence alignment algorithm, this method generates complete, ungapped models of protein structures, including regions both similar to and divergent from a homologous structure. This combined method was used to make predictions for 28 protein domains in the Critical Assessment of Protein Structure 4 (CASP 4) and 59 domains in CASP 5, where the method ranked highly among comparative modeling and fold recognition methods. Model accuracy in these blind predictions is dominated by alignment quality, but in the context of accurate alignments, long protein segments can be accurately modeled. Notably, the method correctly predicted the local structure of a 39-residue insertion into a TIM barrel in CASP 5 target T0186.
Collapse
Affiliation(s)
- Carol A Rohl
- Department of Biomolecular Engineering, University of California, Santa Cruz 95064, USA.
| | | | | | | |
Collapse
|
578
|
Abstract
We have used a sequence prediction algorithm and a novel sampling method to design protein sequences for the WW domain, a small beta-sheet motif. The procedure, referred to as SPANS, designs sequences to be compatible with an ensemble of closely related polypeptide backbones, mimicking the inherent flexibility of proteins. Two designed sequences (termed SPANS-WW1 and SPANS-WW2), using only naturally occurring L-amino acids, were selected for study and the corresponding polypeptides were prepared in Escherichia coli. Circular dichroism data suggested that both purified polypeptides adopted secondary structure features related to those of the target without the aid of disulfide bridges or bound cofactors. The structure exhibited by SPANS-WW2 melted cooperatively by raising the temperature of the solution. Further analysis of this polypeptide by proton nuclear magnetic resonance spectroscopy demonstrated that at 5 degrees C, it folds into a structure closely resembling a natural WW domain. This achievement constitutes one of a small number of successful de novo protein designs through fully automated computational methods and highlights the feasibility of including backbone flexibility in the design strategy.
Collapse
|
579
|
Abstract
Understanding the sequence determinants of protein structure, stability and folding is critical for understanding how natural proteins have evolved and how proteins can be engineered to perform novel functions. The complexity of the protein folding problem requires the ability to search large volumes of sequence space for proteins with specific structural or functional characteristics. Here we describe our efforts to identify novel proteins using a phage-display selection strategy from a 'mini-exon' shuffling library generated from the yeast genome and from completely random sequence libraries, and compare the results to recent successes in generating novel proteins using in silico protein design.
Collapse
Affiliation(s)
- Alexander L Watters
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195, USA
| | | |
Collapse
|
580
|
Scalley-Kim M, Baker D. Characterization of the Folding Energy Landscapes of Computer Generated Proteins Suggests High Folding Free Energy Barriers and Cooperativity may be Consequences of Natural Selection. J Mol Biol 2004; 338:573-83. [PMID: 15081814 DOI: 10.1016/j.jmb.2004.02.055] [Citation(s) in RCA: 83] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2003] [Revised: 02/04/2004] [Accepted: 02/04/2004] [Indexed: 11/21/2022]
Abstract
To determine the extent to which protein folding rates and free energy landscapes have been shaped by natural selection, we have examined the folding kinetics of five proteins generated using computational design methods and, hence, never exposed to natural selection. Four of these proteins are complete computer-generated redesigns of naturally occurring structures and the fifth protein, called Top7, has a computer-generated fold not yet observed in nature. We find that three of the four redesigned proteins fold much faster than their naturally occurring counterparts. While natural selection thus does not appear to operate on protein folding rates, the majority of the designed proteins unfold considerably faster than their naturally occurring counterparts, suggesting possible selection for a high free energy barrier to unfolding. In contrast to almost all naturally occurring proteins of less than 100 residues but consistent with simple computational models, the folding energy landscape for Top7 appears to be quite complex, suggesting the smooth energy landscapes and highly cooperative folding transitions observed for small naturally occurring proteins may also reflect the workings of natural selection.
Collapse
Affiliation(s)
- Michelle Scalley-Kim
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA, USA
| | | |
Collapse
|
581
|
Pokala N, Handel TM. Energy functions for protein design I: efficient and accurate continuum electrostatics and solvation. Protein Sci 2004; 13:925-36. [PMID: 15010542 PMCID: PMC2280065 DOI: 10.1110/ps.03486104] [Citation(s) in RCA: 65] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2003] [Revised: 01/03/2004] [Accepted: 01/09/2004] [Indexed: 10/26/2022]
Abstract
Electrostatics and solvation energies are important for defining protein stability, structural specificity, and molecular recognition. Because these energies are difficult to compute quickly and accurately, they are often ignored or modeled very crudely in computational protein design. To address this problem, we have developed a simple, fast, and accurate approximation for calculating Born radii in the context of protein design calculations. When these approximate Born radii are used with the generalized Born continuum dielectric model, energies calculated by the 10(6)-fold slower finite difference Poisson-Boltzmann model are faithfully reproduced. A similar approach can be used for estimating solvent-accessible surface areas (SASAs). As an independent test, we show that these approximations can be used to accurately predict the experimentally determined pK(a)s of >200 ionizable groups from 15 proteins.
Collapse
Affiliation(s)
- Navin Pokala
- Department of Molecular and Cell Biology, University of California, Berkeley, 237 Hilde-brand Hall, Berkeley, CA 94720-3206, USA.
| | | |
Collapse
|
582
|
Kortemme T, Joachimiak LA, Bullock AN, Schuler AD, Stoddard BL, Baker D. Computational redesign of protein-protein interaction specificity. Nat Struct Mol Biol 2004; 11:371-9. [PMID: 15034550 DOI: 10.1038/nsmb749] [Citation(s) in RCA: 238] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2003] [Accepted: 02/23/2004] [Indexed: 11/08/2022]
Abstract
We developed a 'computational second-site suppressor' strategy to redesign specificity at a protein-protein interface and applied it to create new specifically interacting DNase-inhibitor protein pairs. We demonstrate that the designed switch in specificity holds in in vitro binding and functional assays. We also show that the designed interfaces are specific in the natural functional context in living cells, and present the first high-resolution X-ray crystallographic analysis of a computer-redesigned functional protein-protein interface with altered specificity. The approach should be applicable to the design of interacting protein pairs with novel specificities for delineating and re-engineering protein interaction networks in living cells.
Collapse
Affiliation(s)
- Tanja Kortemme
- Howard Hughes Medical Institute & Department of Biochemistry, Box 357350, University of Washington, Seattle, Washington 98195-7350, USA
| | | | | | | | | | | |
Collapse
|
583
|
Khatun J, Khare SD, Dokholyan NV. Can Contact Potentials Reliably Predict Stability of Proteins? J Mol Biol 2004; 336:1223-38. [PMID: 15037081 DOI: 10.1016/j.jmb.2004.01.002] [Citation(s) in RCA: 57] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2003] [Revised: 01/08/2004] [Accepted: 01/08/2004] [Indexed: 11/17/2022]
Abstract
The simplest approximation of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodology to determine the contact potentials in proteins from experimental measurements of changes in protein's thermodynamic stabilities (DeltaDeltaG) upon mutations. We apply our methodology to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce experimental measurements by statistical tests. We evaluate the maximum accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of experimental (DeltaDeltaG) values. We argue that it is impossible to reach experimental accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of DeltaDeltaG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.
Collapse
Affiliation(s)
- Jainab Khatun
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | | | | |
Collapse
|
584
|
Abstract
Protein-protein interactions are key components of all signal transduction processes, so methods to alter these interactions promise to become important tools in dissecting function of connectivities in these networks. We have developed a fast computational approach for the prediction of energetically important amino acid residues in protein-protein interfaces (available at http://robetta.bakerlab.org/alaninescan), which we, following Peter Kollman, have termed "computational alanine scanning." The input consists of a three-dimensional structure of a protein-protein complex; output is a list of "hot spots," or amino acid side chains that are predicted to significantly destabilize the interface when mutated to alanine, analogous to the results of experimental alanine-scanning mutagenesis. 79% of hot spots and 68% of neutral residues were correctly predicted in a test of 233 mutations in 19 protein-protein complexes. A single interface can be analyzed in minutes. The computational methodology has been validated by the successful design of protein interfaces with new specificity and activity, and has yielded new insights into the mechanisms of receptor specificity and promiscuity in biological systems.
Collapse
Affiliation(s)
- Tanja Kortemme
- Department of Biopharmaceutical Sciences and California Institute for Quantitative Biomedical Research, University of California San Francisco, CA 94107, USA.
| | | | | |
Collapse
|
585
|
Abstract
Computational protein design strategies have been developed to reengineer protein-protein interfaces in an automated, generalizable fashion. In the past two years, these methods have been successfully applied to generate chimeric proteins and protein pairs with specificities different from naturally occurring protein-protein interactions. Although there are shortcomings in current approaches, both in the way conformational space is sampled and in the energy functions used to evaluate designed conformations, the successes suggest we are now entering an era in which computational methods can be used to modulate, reengineer and design protein-protein interaction networks in living cells.
Collapse
Affiliation(s)
- Tanja Kortemme
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, Box 357350, Seattle, WA 98195, USA
| | | |
Collapse
|
586
|
Chivian D, Kim DE, Malmström L, Bradley P, Robertson T, Murphy P, Strauss CEM, Bonneau R, Rohl CA, Baker D. Automated prediction of CASP-5 structures using the Robetta server. Proteins 2004; 53 Suppl 6:524-33. [PMID: 14579342 DOI: 10.1002/prot.10529] [Citation(s) in RCA: 221] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Robetta is a fully automated protein structure prediction server that uses the Rosetta fragment-insertion method. It combines template-based and de novo structure prediction methods in an attempt to produce high quality models that cover every residue of a submitted sequence. The first step in the procedure is the automatic detection of the locations of domains and selection of the appropriate modeling protocol for each domain. For domains matched to a homolog with an experimentally characterized structure by PSI-BLAST or Pcons2, Robetta uses a new alignment method, called K*Sync, to align the query sequence onto the parent structure. It then models the variable regions by allowing them to explore conformational space with fragments in fashion similar to the de novo protocol, but in the context of the template. When no structural homolog is available, domains are modeled with the Rosetta de novo protocol, which allows the full length of the domain to explore conformational space via fragment-insertion, producing a large decoy ensemble from which the final models are selected. The Robetta server produced quite reasonable predictions for targets in the recent CASP-5 and CAFASP-3 experiments, some of which were at the level of the best human predictions.
Collapse
|
587
|
Bradley P, Chivian D, Meiler J, Misura KMS, Rohl CA, Schief WR, Wedemeyer WJ, Schueler-Furman O, Murphy P, Schonbrun J, Strauss CEM, Baker D. Rosetta predictions in CASP5: successes, failures, and prospects for complete automation. Proteins 2004; 53 Suppl 6:457-68. [PMID: 14579334 DOI: 10.1002/prot.10552] [Citation(s) in RCA: 140] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We describe predictions of the structures of CASP5 targets using Rosetta. The Rosetta fragment insertion protocol was used to generate models for entire target domains without detectable sequence similarity to a protein of known structure and to build long loop insertions (and N-and C-terminal extensions) in cases where a structural template was available. Encouraging results were obtained both for the de novo predictions and for the long loop insertions; we describe here the successes as well as the failures in the context of current efforts to improve the Rosetta method. In particular, de novo predictions failed for large proteins that were incorrectly parsed into domains and for topologically complex (high contact order) proteins with swapping of segments between domains. However, for the remaining targets, at least one of the five submitted models had a long fragment with significant similarity to the native structure. A fully automated version of the CASP5 protocol produced results that were comparable to the human-assisted predictions for most of the targets, suggesting that automated genomic-scale, de novo protein structure prediction may soon be worthwhile. For the three targets where the human-assisted predictions were significantly closer to the native structure, we identify the steps that remain to be automated.
Collapse
Affiliation(s)
- Philip Bradley
- Department of Biochemistry, University of Washington, Seattle 98195-7350, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
588
|
Jain RK, Ranganathan R. Local complexity of amino acid interactions in a protein core. Proc Natl Acad Sci U S A 2004; 101:111-6. [PMID: 14684834 PMCID: PMC314147 DOI: 10.1073/pnas.2534352100] [Citation(s) in RCA: 34] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2003] [Indexed: 11/18/2022] Open
Abstract
Atomic resolution structures of proteins indicate that the core is typically well packed, suggesting a densely connected network of interactions between amino acid residues. The combinatorial complexity of energetic interactions in such a network could be enormous, a problem that limits our ability to relate structure and function. Here, we report a case study of the complexity of amino acid interactions in a localized region within the core of the GFP, a particularly stable and tightly packed molecule. Mutations at three sites within the chromophore-binding pocket display an overlapping pattern of conformational change and are thermodynamically coupled, seemingly consistent with the dense network model. However, crystallographic and energetic analyses of coupling between mutations paint a different picture; pairs of mutations couple through independent "hotspots" in the region of structural overlap. The data indicate that, even in highly stable proteins, the core contains sufficient plasticity in packing to uncouple high-order energetic interactions of residues, a property that is likely general in proteins.
Collapse
Affiliation(s)
- Rajul K Jain
- Howard Hughes Medical Institute and Department of Pharmacology, University of Texas Southwestern Medical Center, 5323 Harry Hines Boulevard, Dallas, TX 75390-9050, USA
| | | |
Collapse
|
589
|
Affiliation(s)
- Carol A Rohl
- Department of Biochemistry and Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | | | | | | |
Collapse
|
590
|
Moore GL, Maranas CD. Computational challenges in combinatorial library design for protein engineering. AIChE J 2004. [DOI: 10.1002/aic.10025] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
591
|
Kuhlman B, Dantas G, Ireton GC, Varani G, Stoddard BL, Baker D. Design of a novel globular protein fold with atomic-level accuracy. Science 2003; 302:1364-8. [PMID: 14631033 DOI: 10.1126/science.1089427] [Citation(s) in RCA: 1131] [Impact Index Per Article: 53.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/02/2022]
Abstract
A major challenge of computational protein design is the creation of novel proteins with arbitrarily chosen three-dimensional structures. Here, we used a general computational strategy that iterates between sequence design and structure prediction to design a 93-residue alpha/beta protein called Top7 with a novel sequence and topology. Top7 was found experimentally to be folded and extremely stable, and the x-ray crystal structure of Top7 is similar (root mean square deviation equals 1.2 angstroms) to the design model. The ability to design a new protein fold makes possible the exploration of the large regions of the protein universe not yet observed in nature.
Collapse
Affiliation(s)
- Brian Kuhlman
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | | | |
Collapse
|
592
|
Abstract
We have developed an effective scoring function for protein design. The atomic solvation parameters, together with the weights of energy terms, were optimized so that residues corresponding to the native sequence were predicted with low energy in the training set of 28 protein structures. The solvation energy of non-hydrogen-bonded hydrophilic atoms was considered separately and expressed in a nonlinear way. As a result, our scoring function predicted native residues as the most favorable in 59% of the total positions in 28 proteins. We then tested the scoring function by comparing the predicted stability changes for 103 T4 lysozyme mutants with the experimental values. The correlation coefficients were 0.77 for surface mutations and 0.71 for all mutations. Finally, the scoring function combined with Monte Carlo simulation was used to predict favorable sequences on a fixed backbone. The designed sequences were similar to the natural sequences of the family to which the template structure belonged. The profile of the designed sequences was helpful for identification of remote homologues of the native sequence.
Collapse
Affiliation(s)
- Shide Liang
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas 75390-9050, USA
| | | |
Collapse
|
593
|
Di Nardo AA, Larson SM, Davidson AR. The relationship between conservation, thermodynamic stability, and function in the SH3 domain hydrophobic core. J Mol Biol 2003; 333:641-55. [PMID: 14556750 DOI: 10.1016/j.jmb.2003.08.035] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
To investigate the relationships between sequence conservation, protein stability, and protein function, we have measured the thermodynamic stability, folding kinetics, and in vitro peptide-binding activity of a large number of single-site substitutions in the hydrophobic core of the Fyn SH3 domain. Comparison of these data to that derived from an analysis of a large alignment of SH3 domain sequences revealed a very good correlation between the distinct pattern of conservation observed at each core position and the thermodynamic stability of mutants. Conservation was also found to correlate well with the unfolding rates of mutants, but not to the folding rates, suggesting that evolution selects more strongly for optimal native state packing interactions than for maximal folding rates. Structural analysis suggests that residue-residue core packing interactions are very similar in all SH3 domains, which provides an explanation for the correlation between conservation and mutant stability effects studied in a single SH3 domain. We also demonstrate a correlation between stability and the in vivo activity of mutants, and between conservation and activity. However, the relationship between conservation and activity was very strong only for the three most conserved hydrophobic core positions. The weaker correlation between activity and conservation seen at the other seven core positions indicates that maintenance of protein stability is the dominant selective pressure at these positions. In general, the pattern of conservation at hydrophobic core positions appears to arise from conserved packing constraints, and can be effectively utilized to predict the destabilizing effects of amino acid substitutions.
Collapse
Affiliation(s)
- Ariel A Di Nardo
- Department of Biochemistry, University of Toronto, Toronto, Ont., Canada M5S 1A8
| | | | | |
Collapse
|
594
|
Tsai J, Bonneau R, Morozov AV, Kuhlman B, Rohl CA, Baker D. An improved protein decoy set for testing energy functions for protein structure prediction. Proteins 2003; 53:76-87. [PMID: 12945051 DOI: 10.1002/prot.10454] [Citation(s) in RCA: 139] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
We have improved the original Rosetta centroid/backbone decoy set by increasing the number of proteins and frequency of near native models and by building on sidechains and minimizing clashes. The new set consists of 1,400 model structures for 78 different and diverse protein targets and provides a challenging set for the testing and evaluation of scoring functions. We evaluated the extent to which a variety of all-atom energy functions could identify the native and close-to-native structures in the new decoy sets. Of various implicit solvent models, we found that a solvent-accessible surface area-based solvation provided the best enrichment and discrimination of close-to-native decoys. The combination of this solvation treatment with Lennard Jones terms and the original Rosetta energy provided better enrichment and discrimination than any of the individual terms. The results also highlight the differences in accuracy of NMR and X-ray crystal structures: a large energy gap was observed between native and non-native conformations for X-ray structures but not for NMR structures.
Collapse
Affiliation(s)
- Jerry Tsai
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas 77843, USA.
| | | | | | | | | | | |
Collapse
|
595
|
Pei J, Dokholyan NV, Shakhnovich EI, Grishin NV. Using protein design for homology detection and active site searches. Proc Natl Acad Sci U S A 2003; 100:11361-6. [PMID: 12975528 PMCID: PMC208762 DOI: 10.1073/pnas.2034878100] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2003] [Indexed: 11/18/2022] Open
Abstract
We describe a method of designing artificial sequences that resemble naturally occurring sequences in terms of their compatibility with a template structure and its functional constraints. The design procedure is a Monte Carlo simulation of amino acid substitution process. The selective fixation of substitutions is dictated by a simple scoring function derived from the template structure and a multiple alignment of its homologs. Designed sequences represent an enlargement of sequence space around native sequences. We show that the use of designed sequences improves the performance of profile-based homology detection. The difference in position-specific conservation between designed sequences and native sequences is helpful for prediction of functionally important residues. Our sequence selection criteria in evolutionary simulations introduce amino acid substitution rate variation among sites in a natural way, providing a better model to test phylogenetic methods.
Collapse
Affiliation(s)
- Jimin Pei
- Department of Biochemistry and Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | | | | | | |
Collapse
|
596
|
Dantas G, Kuhlman B, Callender D, Wong M, Baker D. A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol 2003; 332:449-60. [PMID: 12948494 DOI: 10.1016/s0022-2836(03)00888-x] [Citation(s) in RCA: 232] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
A previously developed computer program for protein design, RosettaDesign, was used to predict low free energy sequences for nine naturally occurring protein backbones. RosettaDesign had no knowledge of the naturally occurring sequences and on average 65% of the residues in the designed sequences differ from wild-type. Synthetic genes for ten completely redesigned proteins were generated, and the proteins were expressed, purified, and then characterized using circular dichroism, chemical and temperature denaturation and NMR experiments. Although high-resolution structures have not yet been determined, eight of these proteins appear to be folded and their circular dichroism spectra are similar to those of their wild-type counterparts. Six of the proteins have stabilities equal to or up to 7kcal/mol greater than their wild-type counterparts, and four of the proteins have NMR spectra consistent with a well-packed, rigid structure. These encouraging results indicate that the computational protein design methods can, with significant reliability, identify amino acid sequences compatible with a target protein backbone.
Collapse
Affiliation(s)
- Gautam Dantas
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | |
Collapse
|
597
|
Larson SM, Pande VS. Sequence optimization for native state stability determines the evolution and folding kinetics of a small protein. J Mol Biol 2003; 332:275-86. [PMID: 12946364 DOI: 10.1016/s0022-2836(03)00832-5] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Investigating the relative importance of protein stability, function, and folding kinetics in driving protein evolution has long been hindered by the fact that we can only compare modern natural proteins, the products of the very process we seek to understand, to each other, with no external references or baselines. Through a large-scale all-atom simulation of protein evolution, we have created a large diverse alignment of SH3 domain sequences which have been selected only for native state stability, with no other influencing factors. Although the average pairwise identity between computationally evolved and natural sequences is only 17%, the residue frequency distributions of the computationally evolved sequences are similar to natural SH3 sequences at 86% of the positions in the domain, suggesting that optimization for the native state structure has dominated the evolution of natural SH3 domains. Additionally, the positions which play a consistent role in the transition state of three well-characterized SH3 domains (by phi-value analysis) are structurally optimized for the native state, and vice versa. Indeed, we see a specific and significant correlation between sequence optimization for native state stability and conservation of transition state structure.
Collapse
Affiliation(s)
- Stefan M Larson
- Department of Chemistry and Biophysics Program, Stanford University, Stanford, CA 94305-5080, USA
| | | |
Collapse
|
598
|
Gray JJ, Moughon S, Wang C, Schueler-Furman O, Kuhlman B, Rohl CA, Baker D. Protein-protein docking with simultaneous optimization of rigid-body displacement and side-chain conformations. J Mol Biol 2003; 331:281-99. [PMID: 12875852 DOI: 10.1016/s0022-2836(03)00670-3] [Citation(s) in RCA: 820] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Protein-protein docking algorithms provide a means to elucidate structural details for presently unknown complexes. Here, we present and evaluate a new method to predict protein-protein complexes from the coordinates of the unbound monomer components. The method employs a low-resolution, rigid-body, Monte Carlo search followed by simultaneous optimization of backbone displacement and side-chain conformations using Monte Carlo minimization. Up to 10(5) independent simulations are carried out, and the resulting "decoys" are ranked using an energy function dominated by van der Waals interactions, an implicit solvation model, and an orientation-dependent hydrogen bonding potential. Top-ranking decoys are clustered to select the final predictions. Small-perturbation studies reveal the formation of binding funnels in 42 of 54 cases using coordinates derived from the bound complexes and in 32 of 54 cases using independently determined coordinates of one or both monomers. Experimental binding affinities correlate with the calculated score function and explain the predictive success or failure of many targets. Global searches using one or both unbound components predict at least 25% of the native residue-residue contacts in 28 of the 32 cases where binding funnels exist. The results suggest that the method may soon be useful for generating models of biologically important complexes from the structures of the isolated components, but they also highlight the challenges that must be met to achieve consistent and accurate prediction of protein-protein interactions.
Collapse
Affiliation(s)
- Jeffrey J Gray
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, J-567 Health Sciences, Box 357350, Seattle, WA 98195, USA
| | | | | | | | | | | | | |
Collapse
|
599
|
Gray JJ, Moughon SE, Kortemme T, Schueler-Furman O, Misura KMS, Morozov AV, Baker D. Protein-protein docking predictions for the CAPRI experiment. Proteins 2003; 52:118-22. [PMID: 12784377 DOI: 10.1002/prot.10384] [Citation(s) in RCA: 88] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
We predicted structures for all seven targets in the CAPRI experiment using a new method in development at the time of the challenge. The technique includes a low-resolution rigid body Monte Carlo search followed by high-resolution refinement with side-chain conformational changes and rigid body minimization. Decoys (approximately 10(6) per target) were discriminated using a scoring function including van der Waals and solvation interactions, hydrogen bonding, residue-residue pair statistics, and rotamer probabilities. Decoys were ranked, clustered, manually inspected, and selected. The top ranked model for target 6 predicted the experimental structure to 1.5 A RMSD and included 48 of 65 correct residue-residue contacts. Target 7 was predicted at 5.3 A RMSD with 22 of 37 correct residue-residue contacts using a homology model from a known complex structure. Using a preliminary version of the protocol in round 1, target 1 was predicted within 8.8 A although few contacts were correct. For targets 2 and 3, the interface locations and a small fraction of the contacts were correctly identified.
Collapse
MESH Headings
- Algorithms
- Amino Acid Sequence
- Antibodies/chemistry
- Antibodies/immunology
- Antigens, Viral
- Bacterial Proteins/chemistry
- Bacterial Proteins/metabolism
- Binding Sites
- Capsid Proteins/chemistry
- Capsid Proteins/immunology
- Exotoxins/chemistry
- Exotoxins/metabolism
- Hemagglutinin Glycoproteins, Influenza Virus/chemistry
- Hemagglutinin Glycoproteins, Influenza Virus/immunology
- Macromolecular Substances
- Membrane Proteins/chemistry
- Membrane Proteins/metabolism
- Models, Molecular
- Molecular Sequence Data
- Monte Carlo Method
- Phosphoenolpyruvate Sugar Phosphotransferase System/chemistry
- Phosphoenolpyruvate Sugar Phosphotransferase System/metabolism
- Protein Interaction Mapping
- Protein Serine-Threonine Kinases/chemistry
- Protein Serine-Threonine Kinases/metabolism
- Proteins/chemistry
- Proteins/metabolism
- Receptors, Antigen, T-Cell, alpha-beta/chemistry
- Receptors, Antigen, T-Cell, alpha-beta/metabolism
- Sequence Alignment
- alpha-Amylases/chemistry
- alpha-Amylases/metabolism
Collapse
Affiliation(s)
- Jeffrey J Gray
- Howard Hughes Medical Institute and Department of Biochemistry, University of Washington, Seattle, Washington, USA
| | | | | | | | | | | | | |
Collapse
|
600
|
Bolon DN, Marcus JS, Ross SA, Mayo SL. Prudent modeling of core polar residues in computational protein design. J Mol Biol 2003; 329:611-22. [PMID: 12767838 DOI: 10.1016/s0022-2836(03)00423-6] [Citation(s) in RCA: 37] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
Abstract
Hydrogen bond interactions were surveyed in a set of protein structures. Compared to surface positions, polar side-chains at core positions form a greater number of intra-molecular hydrogen bonds. Furthermore, the majority of polar side-chains at core positions form at least one hydrogen bond to main-chain atoms that are not involved in hydrogen bonds to other main-chain atoms. Based on this structural survey, hydrogen bond rules were generated for each polar amino acid for use in protein core design. In the context of protein core design, these prudent polar rules were used to eliminate from consideration polar amino acid rotamers that do not form a minimum number of hydrogen bonds. As an initial test, the core of Escherichia coli thioredoxin was selected as a design target. For this target, the prudent polar strategy resulted in a minor increase in computational complexity compared to a strategy that did not allow polar residues. Dead-end elimination was used to identify global minimum energy conformations for the prudent polar and no polar strategies. The prudent polar strategy identified a protein sequence that was thermodynamically stabilized by 2.5 kcal/mol relative to wild-type thioredoxin and 2.2 kcal/mol relative to a thioredoxin variant whose core was designed without polar residues.
Collapse
Affiliation(s)
- Daniel N Bolon
- Biochemistry and Molecular Biophysics Option, California Institute of Technology, Mail Code 114-96, Pasadena, CA 91125, USA
| | | | | | | |
Collapse
|