101
|
Kingsford CL, Chazelle B, Singh M. Solving and analyzing side-chain positioning problems using linear and integer programming. Bioinformatics 2004; 21:1028-36. [PMID: 15546935 DOI: 10.1093/bioinformatics/bti144] [Citation(s) in RCA: 117] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Side-chain positioning is a central component of homology modeling and protein design. In a common formulation of the problem, the backbone is fixed, side-chain conformations come from a rotamer library, and a pairwise energy function is optimized. It is NP-complete to find even a reasonable approximate solution to this problem. We seek to put this hardness result into practical context. RESULTS We present an integer linear programming (ILP) formulation of side-chain positioning that allows us to tackle large problem sizes. We relax the integrality constraint to give a polynomial-time linear programming (LP) heuristic. We apply LP to position side chains on native and homologous backbones and to choose side chains for protein design. Surprisingly, when positioning side chains on native and homologous backbones, optimal solutions using a simple, biologically relevant energy function can usually be found using LP. On the other hand, the design problem often cannot be solved using LP directly; however, optimal solutions for large instances can still be found using the computationally more expensive ILP procedure. While different energy functions also affect the difficulty of the problem, the LP/ILP approach is able to find optimal solutions. Our analysis is the first large-scale demonstration that LP-based approaches are highly effective in finding optimal (and successive near-optimal) solutions for the side-chain positioning problem.
Collapse
Affiliation(s)
- Carleton L Kingsford
- Department of Computer Science and the Lewis-Sigler Institute for Integrative Genomics, Princeton University Princeton, NJ 08544, USA
| | | | | |
Collapse
|
102
|
Abstract
We have developed a process that significantly reduces the number of rotamers in computational protein design calculations. This process, which we call Vegas, results in dramatic computational performance increases when used with algorithms based on the dead-end elimination (DEE) theorem. Vegas estimates the energy of each rotamer at each position by fixing each rotamer in turn and utilizing various search algorithms to optimize the remaining positions. Algorithms used for this context specific optimization can include Monte Carlo, self-consistent mean field, and the evaluation of an expression that generates a lower bound energy for the fixed rotamer. Rotamers with energies above a user-defined cutoff value are eliminated. We found that using Vegas to preprocess rotamers significantly reduced the calculation time of subsequent DEE-based algorithms while retaining the global minimum energy conformation. For a full boundary design of a 51 amino acid fragment of engrailed homeodomain, the total calculation time was reduced by 12-fold.
Collapse
Affiliation(s)
- Premal S Shah
- Biochemistry and Molecular Biophysics Option, Division of Biology, California Institute of Technology, 114-96, 1200 E. California Blvd., Pasadena, California 91125, USA
| | | | | |
Collapse
|
103
|
Peterson RW, Dutton PL, Wand AJ. Improved side-chain prediction accuracy using an ab initio potential energy function and a very large rotamer library. Protein Sci 2004; 13:735-51. [PMID: 14978310 PMCID: PMC2286725 DOI: 10.1110/ps.03250104] [Citation(s) in RCA: 52] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Accurate prediction of the placement and comformations of protein side chains given only the backbone trace has a wide range of uses in protein design, structure prediction, and functional analysis. Prediction has most often relied on discrete rotamer libraries so that rapid fitness of side-chain rotamers can be assessed against some scoring function. Scoring functions are generally based on experimental parameters from small-molecule studies or empirical parameters based on determined protein structures. Here, we describe the NCN algorithm for predicting the placement of side chains. A predominantly first-principles approach was taken to develop the potential energy function incorporating van der Waals and electrostatics based on the OPLS parameters, and a hydrogen bonding term. The only empirical knowledge used is the frequency of rotameric states from the PDB. The rotamer library includes nearly 50,000 rotamers, and is the most extensive discrete library used to date. Although the computational time tends to be longer than most other algorithms, the overall accuracy exceeds all algorithms in the literature when placing rotamers on an accurate backbone trace. Considering only the most buried residues, 80% of the total residues tested, the placement accuracy reaches 92% for chi(1), and 83% for chi(1 + 2), and an overall RMS deviation of 1 A. Additionally, we show that if information is available to restrict chi(1) to one rotamer well, then this algorithm can generate structures with an average RMS deviation of 1.0 A for all heavy side-chains atoms and a corresponding overall chi(1 + 2) accuracy of 85.0%.
Collapse
Affiliation(s)
- Ronald W Peterson
- The Johnson Research Foundation, Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | |
Collapse
|
104
|
Abstract
Computational protein design continues to experience a variety of methodological advances. Several improvements have been suggested for the objective functions used to quantify sequence/structure compatibility. Disparate design strategies based upon dead-end elimination, simulated annealing and statistical design have each recently yielded striking successes involving de novo designed proteins with sizes on the order of 100 residues or greater. Such methods may be used to design new proteins, as well as to redesign natural proteins to facilitate structural and biophysical studies.
Collapse
Affiliation(s)
- Sheldon Park
- Makineni Theoretical Laboratories and Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104, USA
| | | | | |
Collapse
|
105
|
Dwyer MA, Hellinga HW. Periplasmic binding proteins: a versatile superfamily for protein engineering. Curr Opin Struct Biol 2004; 14:495-504. [PMID: 15313245 DOI: 10.1016/j.sbi.2004.07.004] [Citation(s) in RCA: 257] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
The diversity of biological function, ligand binding, conformational changes and structural adaptability of the periplasmic binding protein superfamily have been exploited to engineer biosensors, allosteric control elements, biologically active receptors and enzymes using a combination of techniques, including computational design. Extensively redesigned periplasmic binding proteins have been re-introduced into bacteria to function in synthetic signal transduction pathways that respond to extracellular ligands and as biologically active enzymes.
Collapse
Affiliation(s)
- Mary A Dwyer
- Department of Biochemistry, Box 3711, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | |
Collapse
|
106
|
Allert M, Rizk SS, Looger LL, Hellinga HW. Computational design of receptors for an organophosphate surrogate of the nerve agent soman. Proc Natl Acad Sci U S A 2004; 101:7907-12. [PMID: 15148405 PMCID: PMC419530 DOI: 10.1073/pnas.0401309101] [Citation(s) in RCA: 84] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We report the computational design of soluble protein receptors for pinacolyl methyl phosphonic acid (PMPA), the predominant hydrolytic product of the nerve agent soman. Using recently developed computational protein design techniques, the ligand-binding pockets of two periplasmic binding proteins, glucose-binding protein and ribose-binding protein, were converted to bind PMPA instead of their cognate sugars. The designs introduce 9-12 mutations in the parent proteins. Twelve of 20 designs tested exhibited PMPA-dependent changes in emission intensity of a fluorescent reporter with affinities between 45 nM and 10 microM. The contributions to ligand binding by individual residues were determined in two designs by alanine-scanning mutagenesis, and are consistent with the molecular models. These results demonstrate that designed receptors with radically altered binding specificities and affinities that rival or exceed those of the parent proteins can be successfully predicted. The designs vary in parent scaffold, sequence diversity, and orientation of docked ligand, suggesting that the number of possible solutions to the design problem is large and degenerate. This observation has implications for the genesis of biological function by random processes. The designed receptors reported here may have utility in the development of fluorescent biosensors for monitoring nerve agents.
Collapse
Affiliation(s)
- Malin Allert
- Departments of Biochemistry and Pharmacology and Molecular Cancer Biology, Box 3711, Duke University Medical Center, Durham, NC 27710, USA
| | | | | | | |
Collapse
|
107
|
Abstract
To facilitate the process of protein design and learn the basic rules that control the structure and stability of proteins, combinatorial methods have been developed to select or screen proteins with desired properties from libraries of mutants. One such method uses phage-display and proteolysis to select stably folded proteins. This method does not rely on specific properties of proteins for selection. Therefore, in principle it can be applied to any protein. Since its first demonstration in 1998, the method has been used to create hyperthermophilic proteins, to evolve novel folded domains from a library generated by combinatorial shuffling of polypeptide segments and to convert a partially unfolded structure to a fully folded protein.
Collapse
Affiliation(s)
- Yawen Bai
- Laboratory of Biochemistry, National Cancer Institute, NIH, Bethesda, MD 20892, USA.
| | | |
Collapse
|
108
|
Abstract
Why do proteins adopt the conformations that they do, and what determines their stabilities? While we have come to some understanding of the forces that underlie protein architecture, a precise, predictive, physicochemical explanation is still elusive. Two obstacles to addressing these questions are the unfathomable vastness of protein sequence space, and the difficulty in making direct physical measurements on large numbers of protein variants. Here, we review combinatorial methods that have been applied to problems in protein biophysics over the last 15 years. The effects of hydrophobic core composition, the most important determinant of structure and stability, are still poorly understood. Particular attention is given to core composition as addressed by library methods. Increasingly useful screens and selections, in combination with modern high-throughput approaches borrowed from genomics and proteomics efforts, are making the empirical, statistical correlation between sequence and structure a tractable problem for the coming years.
Collapse
Affiliation(s)
- Thomas J Magliery
- Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
| | | |
Collapse
|
109
|
Loose C, Klepeis JL, Floudas CA. A new pairwise folding potential based on improved decoy generation and side-chain packing. Proteins 2004; 54:303-14. [PMID: 14696192 DOI: 10.1002/prot.10521] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
A new force field for pairwise residue interactions as a function of C(alpha) to C(alpha) distances is presented. The force field was developed through the solution of a linear programming formulation with large sets of constraints. The constraints are based on the construction of >80,000 low-energy decoys for a set of proteins and requiring the decoy energies for each protein system to be higher than the native conformation of that particular protein. The generation of a robust force field was facilitated by the use of a novel decoy generation process, which involved the rational selection of proteins to add to the training set and included a significant energy minimization of the decoys. The force field was tested on a large set of decoys for various proteins not included in the training set and shown to perform well compared with a leading force field in identifying the native conformation for these proteins.
Collapse
Affiliation(s)
- C Loose
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08540, USA
| | | | | |
Collapse
|
110
|
Comparative Protein Structure Modeling and its Applications to Drug Discovery. ANNUAL REPORTS IN MEDICINAL CHEMISTRY 2004. [DOI: 10.1016/s0065-7743(04)39020-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/16/2023]
|
111
|
Moore GL, Maranas CD. Computational challenges in combinatorial library design for protein engineering. AIChE J 2004. [DOI: 10.1002/aic.10025] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
112
|
Kaźmierkiewicz R, Liwo A, Scheraga HA. Addition of side chains to a known backbone with defined side-chain centroids. Biophys Chem 2003; 100:261-80. [PMID: 12646370 DOI: 10.1016/s0301-4622(02)00285-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
An automatic procedure is proposed for adding side chains to a protein backbone; it is based on optimization of a simplified energy function for peptide side chains, given its backbone and positions of side-chain centroids. The energy is expressed as a sum of the energies of interaction between side chains, and a harmonic penalty function accounting for the preservation of the positions of the C(alpha) atoms and the side-chain centroids. The energy of side-chain interactions is calculated with the soft-sphere ECEPP/3 potential. A Monte Carlo search is carried out to explore all possible side-chain orientations within a fixed backbone and side-chain centroid positions. The initial, usually extended, side-chain conformations are taken directly from the ECEPP/3 database. The procedure was tested on six experimental (X-ray or NMR) structures: immunoglobulin binding protein (PDB code 1IGD, an alpha+beta-protein); transcription factor PML (PDB code 1BOR, a 49-104 fragment of the ring finger domain, predominantly beta-protein); bovine pancreatic trypsin inhibitor (crystal form II) (PDB code 1BPI, an alpha+beta-protein); the monomer of human deoxyhemoglobin (PDB code 1BZ0, an alpha-helical structure); chain A of alcohol dehydrogenase from Drosophila lebanonensis (PDB code 1A4U); as well as on the 10-55 portion of the B domain of staphylococcal protein A (PDB code 1BDD). In all cases except 1BPI, the data for the algorithm (i.e. the backbone or C(alpha) coordinates and the positions of side-chain centroids) were taken from the experimental structures. For protein A, the C(alpha) coordinates and positions of side-chain centroids were also taken from the 1.9-A-resolution model predicted by the UNRES force field. In all comparisons with experimental structures, complete side-chain geometry was reconstructed with a root-mean-square (RMS) deviation of approximately 0.6-0.9 A from the heavy atoms when complete backbone and side-chain-centroid coordinates were used in reconstruction, or approximately 1.0 A when the C(alpha) and centroid coordinates were used.
Collapse
Affiliation(s)
- Rajmund Kaźmierkiewicz
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, NY 14853-1301, USA
| | | | | |
Collapse
|
113
|
Contreras-Moreira B, Jonsson PF, Bates PA. Structural context of exons in protein domains: implications for protein modelling and design. J Mol Biol 2003; 333:1045-59. [PMID: 14583198 DOI: 10.1016/j.jmb.2003.09.023] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
Intron boundaries were extracted from genomic data and mapped onto single-domain human and murine protein structures taken from the Protein Data Bank. A first analysis of this set of proteins shows that intron boundaries prefer to be in non-regular secondary structure elements, while avoiding alpha-helices and beta-strands. This fact alone suggests an evolutionary model in which introns are constrained by protein structure, particularly by tertiary structure contacts. In addition, in silico recombination experiments of a subset of these proteins together with their homologues, including those in different species, show that introns have a tendency to occur away from artificial crossover hot spots. Altogether, these findings support a model in which genes can preferentially harbour introns in less constrained regions of the protein fold they code for. In the light of these findings, we discuss some implications for protein modelling and design.
Collapse
Affiliation(s)
- Bruno Contreras-Moreira
- Biomolecular Modelling Laboratory, Cancer Research UK, London Research Institute, Lincoln's Inn Fields Laboratories, 44 Lincoln's Inn Fields, London WC2A 3PX, UK
| | | | | |
Collapse
|
114
|
Dwyer MA, Looger LL, Hellinga HW. Computational design of a Zn2+ receptor that controls bacterial gene expression. Proc Natl Acad Sci U S A 2003; 100:11255-60. [PMID: 14500902 PMCID: PMC208744 DOI: 10.1073/pnas.2032284100] [Citation(s) in RCA: 56] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2003] [Indexed: 11/18/2022] Open
Abstract
The control of cellular physiology and gene expression in response to extracellular signals is a basic property of living systems. We have constructed a synthetic bacterial signal transduction pathway in which gene expression is controlled by extracellular Zn2+. In this system a computationally designed Zn2+-binding periplasmic receptor senses the extracellular solute and triggers a two-component signal transduction pathway via a chimeric transmembrane protein, resulting in transcriptional up-regulation of a beta-galactosidase reporter gene. The Zn2+-binding site in the designed receptor is based on a four-coordinate, tetrahedral primary coordination sphere consisting of histidines and glutamates. In addition, mutations were introduced in a secondary coordination sphere to satisfy the residual hydrogen-bonding potential of the histidines coordinated to the metal. The importance of the secondary shell interactions is demonstrated by their effect on metal affinity and selectivity, as well as protein stability. Three designed protein sequences, comprising two distinct metal-binding positions, were all shown to bind Zn2+ and to function in the cell-based assay, indicating the generality of the design methodology. These experiments demonstrate that biological systems can be manipulated with computationally designed proteins that have drastically altered ligand-binding specificities, thereby extending the repertoire of genetic control by extracellular signals.
Collapse
Affiliation(s)
- M A Dwyer
- Department of Biochemistry, Box 3711, Duke University, Durham, NC 27710, USA
| | | | | |
Collapse
|
115
|
Dantas G, Kuhlman B, Callender D, Wong M, Baker D. A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins. J Mol Biol 2003; 332:449-60. [PMID: 12948494 DOI: 10.1016/s0022-2836(03)00888-x] [Citation(s) in RCA: 232] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
A previously developed computer program for protein design, RosettaDesign, was used to predict low free energy sequences for nine naturally occurring protein backbones. RosettaDesign had no knowledge of the naturally occurring sequences and on average 65% of the residues in the designed sequences differ from wild-type. Synthetic genes for ten completely redesigned proteins were generated, and the proteins were expressed, purified, and then characterized using circular dichroism, chemical and temperature denaturation and NMR experiments. Although high-resolution structures have not yet been determined, eight of these proteins appear to be folded and their circular dichroism spectra are similar to those of their wild-type counterparts. Six of the proteins have stabilities equal to or up to 7kcal/mol greater than their wild-type counterparts, and four of the proteins have NMR spectra consistent with a well-packed, rigid structure. These encouraging results indicate that the computational protein design methods can, with significant reliability, identify amino acid sequences compatible with a target protein backbone.
Collapse
Affiliation(s)
- Gautam Dantas
- Department of Biochemistry, University of Washington, Seattle, WA 98195, USA
| | | | | | | | | |
Collapse
|
116
|
Abstract
The success of structural genomics initiatives requires the development and application of tools for structure analysis, prediction, and annotation. In this paper we review recent developments in these areas; specifically structure alignment, the detection of remote homologs and analogs, homology modeling and the use of structures to predict function. We also discuss various rationales for structural genomics initiatives. These include the structure-based clustering of sequence space and genome-wide function assignment. It is also argued that structural genomics can be integrated into more traditional biological research if specific biological questions are included in target selection strategies.
Collapse
Affiliation(s)
- Sharon Goldsmith-Fischman
- Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10032, USA
| | | |
Collapse
|
117
|
Canutescu AA, Shelenkov AA, Dunbrack RL. A graph-theory algorithm for rapid protein side-chain prediction. Protein Sci 2003; 12:2001-14. [PMID: 12930999 PMCID: PMC2323997 DOI: 10.1110/ps.03154503] [Citation(s) in RCA: 743] [Impact Index Per Article: 35.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
Fast and accurate side-chain conformation prediction is important for homology modeling, ab initio protein structure prediction, and protein design applications. Many methods have been presented, although only a few computer programs are publicly available. The SCWRL program is one such method and is widely used because of its speed, accuracy, and ease of use. A new algorithm for SCWRL is presented that uses results from graph theory to solve the combinatorial problem encountered in the side-chain prediction problem. In this method, side chains are represented as vertices in an undirected graph. Any two residues that have rotamers with nonzero interaction energies are considered to have an edge in the graph. The resulting graph can be partitioned into connected subgraphs with no edges between them. These subgraphs can in turn be broken into biconnected components, which are graphs that cannot be disconnected by removal of a single vertex. The combinatorial problem is reduced to finding the minimum energy of these small biconnected components and combining the results to identify the global minimum energy conformation. This algorithm is able to complete predictions on a set of 180 proteins with 34342 side chains in <7 min of computer time. The total chi(1) and chi(1 + 2) dihedral angle accuracies are 82.6% and 73.7% using a simple energy function based on the backbone-dependent rotamer library and a linear repulsive steric energy. The new algorithm will allow for use of SCWRL in more demanding applications such as sequence design and ab initio structure prediction, as well addition of a more complex energy function and conformational flexibility, leading to increased accuracy.
Collapse
Affiliation(s)
- Adrian A Canutescu
- Institute for Cancer Research, Fox Chase Cancer Center, Philadelphia, Pennsylvania 19111, USA
| | | | | |
Collapse
|
118
|
Looger LL, Dwyer MA, Smith JJ, Hellinga HW. Computational design of receptor and sensor proteins with novel functions. Nature 2003; 423:185-90. [PMID: 12736688 DOI: 10.1038/nature01556] [Citation(s) in RCA: 461] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2002] [Accepted: 03/10/2003] [Indexed: 11/08/2022]
Abstract
The formation of complexes between proteins and ligands is fundamental to biological processes at the molecular level. Manipulation of molecular recognition between ligands and proteins is therefore important for basic biological studies and has many biotechnological applications, including the construction of enzymes, biosensors, genetic circuits, signal transduction pathways and chiral separations. The systematic manipulation of binding sites remains a major challenge. Computational design offers enormous generality for engineering protein structure and function. Here we present a structure-based computational method that can drastically redesign protein ligand-binding specificities. This method was used to construct soluble receptors that bind trinitrotoluene, l-lactate or serotonin with high selectivity and affinity. These engineered receptors can function as biosensors for their new ligands; we also incorporated them into synthetic bacterial signal transduction pathways, regulating gene expression in response to extracellular trinitrotoluene or l-lactate. The use of various ligands and proteins shows that a high degree of control over biomolecular recognition has been established computationally. The biological and biosensing activities of the designed receptors illustrate potential applications of computational design.
Collapse
Affiliation(s)
- Loren L Looger
- Department of Biochemistry, Duke University Medical Center, Durham, North Carolina 27710, USA
| | | | | | | |
Collapse
|
119
|
Abstract
Computational methods play a central role in the rational design of novel proteins. The present work describes a new hybrid exact rotamer optimization (HERO) method that builds on previous dead-end elimination algorithms to yield dramatic performance enhancements. Measured on experimentally validated physical models, these improvements make it possible to perform previously intractable designs of entire protein core, surface, or boundary regions. Computational demonstrations include a full core design of the variable domains of the light and heavy chains of catalytic antibody 48G7 FAB with 74 residues and 10(128) conformations, a full core/boundary design of the beta1 domain of protein G with 25 residues and 10(53) conformations, and a full surface design of the beta1 domain of protein G with 27 residues and 10(60) conformations. In addition, a full sequence design of the beta1 domain of protein G is used to demonstrate the strong dependence of algorithm performance on the exact form of the potential function and the fidelity of the rotamer library. These results emphasize that search algorithm performance for protein design can only be meaningfully evaluated on physical models that have been subjected to experimental scrutiny. The new algorithm greatly facilitates ongoing efforts to engineer increasingly complex protein features.
Collapse
Affiliation(s)
- D Benjamin Gordon
- Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142, USA
| | | | | | | |
Collapse
|
120
|
Adcock SA. Peptide backbone reconstruction using dead-end elimination and a knowledge-based forcefield. J Comput Chem 2003; 25:16-27. [PMID: 14634990 DOI: 10.1002/jcc.10314] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
A novel, yet simple and automated, protocol for reconstruction of complete peptide backbones from C(alpha) coordinates only is described, validated, and benchmarked. The described method collates a set of possible backbone conformations for each set of residue triads from a structural library derived from the PDB. The optimal permutation of these three residue segments of backbone conformations is determined using the dead-end elimination (DEE) algorithm. Putative conformations are evaluated using a pairwise-additive knowledge-based forcefield term and a fragment overlap term. The protocol described in this report is able to restore the full backbone coordinates to within 0.2-0.6 A of the actual crystal structure from C(alpha) coordinates only. In addition, it is insensitive to errors in the input C(alpha) coordinates with RMSDs of 3.0 A, and this is illustrated through application to deliberately distorted C(alpha) traces. The entire process, as described, is rapid, requiring of the order of a few minutes for a typical protein on a typical desktop PC. Approximations enable this to be reduced to a few seconds, although this is at the expense of prediction accuracy. This compares very favorably to previously published methods, being sufficiently fast for general use and being one of the most accurate methods. Because the method is not restricted to the reconstruction from only C(alpha) coordinates, reconstruction based on C(beta) coordinates is also demonstrated.
Collapse
Affiliation(s)
- Stewart A Adcock
- Department of Chemistry and Biochemistry, University of California-San Diego, 4234 Urey Hall, 9500 Gilman Drive, La Jolla, California 92093-0365, USA.
| |
Collapse
|
121
|
Abstract
Biologists working in the area of computational protein design have never doubted the seriousness of the algorithmic challenges that face them in attempting in silico sequence selection. It turns out that in the language of the computer science community, this discrete optimization problem is NP-hard. The purpose of this paper is to explain the context of this observation, to provide a simple illustrative proof and to discuss the implications for future progress on algorithms for computational protein design.
Collapse
Affiliation(s)
- Niles A Pierce
- Applied and Computational Mathematics, California Institute of Technology, Pasadena, CA 91125, USA.
| | | |
Collapse
|
122
|
Pettersson PL, Johansson AS, Mannervik B. Transmutation of human glutathione transferase A2-2 with peroxidase activity into an efficient steroid isomerase. J Biol Chem 2002; 277:30019-22. [PMID: 12023294 DOI: 10.1074/jbc.m204485200] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
A major goal in protein engineering is the tailor-making of enzymes for specified chemical reactions. Successful attempts have frequently been based on directed molecular evolution involving libraries of random mutants in which variants with desired properties were identified. For the engineering of enzymes with novel functions, it would be of great value if the necessary changes of the active site could be predicted and implemented. Such attempts based on the comparison of similar structures with different substrate selectivities have previously met with limited success. However, the present work shows that the knowledge-based redesign restricted to substrate-binding residues in human glutathione transferase A2-2 can introduce high steroid double-bond isomerase activity into the enzyme originally characterized by glutathione peroxidase activity. Both the catalytic center activity (k(cat)) and catalytic efficiency (k(cat)/K(m)) match the values of the naturally evolved glutathione transferase A3-3, the most active steroid isomerase known in human tissues. The substrate selectivity of the mutated glutathione transferase was changed 7000-fold by five point mutations. This example demonstrates the functional plasticity of the glutathione transferase scaffold as well as the potential of rational active-site directed mutagenesis as a complement to DNA shuffling and other stochastic methods for the redesign of proteins with novel functions.
Collapse
Affiliation(s)
- Par L Pettersson
- Department of Biochemistry, Uppsala University, Biomedical Center, Box 576, SE-751 23 Uppsala, Sweden
| | | | | |
Collapse
|
123
|
Abstract
Rotamer libraries are widely used in protein structure prediction, protein design, and structure refinement. As the size of the structure data base has increased rapidly in recent years, it has become possible to derive well-refined rotamer libraries using strict criteria for data inclusion and for studying dependence of rotamer populations and dihedral angles on local structural features.
Collapse
Affiliation(s)
- Roland L Dunbrack
- Institute for Cancer Research, Fox Chase Cancer Center, 7701 Burholme Avenue, Philadelphia PA 19111, USA.
| |
Collapse
|
124
|
Abstract
The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.
Collapse
Affiliation(s)
- Joaquim Mendes
- European Molecular Biology Laboratory, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
| | | | | |
Collapse
|
125
|
Yang JM, Tsai CH, Hwang MJ, Tsai HK, Hwang JK, Kao CY. GEM: a Gaussian Evolutionary Method for predicting protein side-chain conformations. Protein Sci 2002; 11:1897-907. [PMID: 12142444 PMCID: PMC2373689 DOI: 10.1110/ps.4940102] [Citation(s) in RCA: 19] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
Abstract
We have developed an evolutionary approach to predicting protein side-chain conformations. This approach, referred to as the Gaussian Evolutionary Method (GEM), combines both discrete and continuous global search mechanisms. The former helps speed up convergence by reducing the size of rotamer space, whereas the latter, integrating decreasing-based Gaussian mutations and self-adaptive Gaussian mutations, continuously adapts dihedrals to optimal conformations. We tested our approach on 38 proteins ranging in size from 46 to 325 residues and showed that the results were comparable to those using other methods. The average accuracies of our predictions were 80% for chi(1), 66% for chi(1 + 2), and 1.36 A for the root mean square deviation of side-chain positions. We found that if our scoring function was perfect, the prediction accuracy was also essentially perfect. However, perfect prediction could not be achieved if only a discrete search mechanism was applied. These results suggest that GEM is robust and can be used to examine the factors limiting the accuracy of protein side-chain prediction methods. Furthermore, it can be used to systematically evaluate and thus improve scoring functions.
Collapse
Affiliation(s)
- Jinn-Moon Yang
- Department of Biological Science and Technology and Institute of Bioinformatics, National Chiao Tung University, Hsinchu, 30050, Taiwan.
| | | | | | | | | | | |
Collapse
|
126
|
Desmet J, Spriet J, Lasters I. Fast and accurate side-chain topology and energy refinement (FASTER) as a new method for protein structure optimization. Proteins 2002; 48:31-43. [PMID: 12012335 DOI: 10.1002/prot.10131] [Citation(s) in RCA: 90] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
We have developed an original method for global optimization of protein side-chain conformations, called the Fast and Accurate Side-Chain Topology and Energy Refinement (FASTER) method. The method operates by systematically overcoming local minima of increasing order. Comparison of the FASTER results with those of the dead-end elimination (DEE) algorithm showed that both methods produce nearly identical results, but the FASTER algorithm is 100-1000 times faster than the DEE method and scales in a stable and favorable way as a function of protein size. We also show that low-order local minima may be almost as accurate as the global minimum when evaluated against experimentally determined structures. In addition, the new algorithm provides significant information about the conformational flexibility of individual side-chains. We observed that strictly rigid side-chains are concentrated mainly in the core of the protein, whereas highly flexible side-chains are found almost exclusively among solvent-oriented residues.
Collapse
|
127
|
Abstract
Modeling side-chain conformations on a fixed protein backbone has a wide application in structure prediction and molecular design. Each effort in this field requires decisions about a rotamer set, scoring function, and search strategy. We have developed a new and simple scoring function, which operates on side-chain rotamers and consists of the following energy terms: contact surface, volume overlap, backbone dependency, electrostatic interactions, and desolvation energy. The weights of these energy terms were optimized to achieve the minimal average root mean square (rms) deviation between the lowest energy rotamer and real side-chain conformation on a training set of high-resolution protein structures. In the course of optimization, for every residue, its side chain was replaced by varying rotamers, whereas conformations for all other residues were kept as they appeared in the crystal structure. We obtained prediction accuracy of 90.4% for chi(1), 78.3% for chi(1 + 2), and 1.18 A overall rms deviation. Furthermore, the derived scoring function combined with a Monte Carlo search algorithm was used to place all side chains onto a protein backbone simultaneously. The average prediction accuracy was 87.9% for chi(1), 73.2% for chi(1 + 2), and 1.34 A rms deviation for 30 protein structures. Our approach was compared with available side-chain construction methods and showed improvement over the best among them: 4.4% for chi(1), 4.7% for chi(1 + 2), and 0.21 A for rms deviation. We hypothesize that the scoring function instead of the search strategy is the main obstacle in side-chain modeling. Additionally, we show that a more detailed rotamer library is expected to increase chi(1 + 2) prediction accuracy but may have little effect on chi(1) prediction accuracy.
Collapse
Affiliation(s)
- Shide Liang
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, Texas 75390, USA
| | | |
Collapse
|
128
|
Glick M, Rayan A, Goldblum A. A stochastic algorithm for global optimization and for best populations: a test case of side chains in proteins. Proc Natl Acad Sci U S A 2002; 99:703-8. [PMID: 11792838 PMCID: PMC117369 DOI: 10.1073/pnas.022418199] [Citation(s) in RCA: 42] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The problem of global optimization is pivotal in a variety of scientific fields. Here, we present a robust stochastic search method that is able to find the global minimum for a given cost function, as well as, in most cases, any number of best solutions for very large combinatorial "explosive" systems. The algorithm iteratively eliminates variable values that contribute consistently to the highest end of a cost function's spectrum of values for the full system. Values that have not been eliminated are retained for a full, exhaustive search, allowing the creation of an ordered population of best solutions, which includes the global minimum. We demonstrate the ability of the algorithm to explore the conformational space of side chains in eight proteins, with 54 to 263 residues, to reproduce a population of their low energy conformations. The 1,000 lowest energy solutions are identical in the stochastic (with two different seed numbers) and full, exhaustive searches for six of eight proteins. The others retain the lowest 141 and 213 (of 1,000) conformations, depending on the seed number, and the maximal difference between stochastic and exhaustive is only about 0.15 Kcal/mol. The energy gap between the lowest and highest of the 1,000 low-energy conformers in eight proteins is between 0.55 and 3.64 Kcal/mol. This algorithm offers real opportunities for solving problems of high complexity in structural biology and in other fields of science and technology.
Collapse
Affiliation(s)
- Meir Glick
- Department of Medicinal Chemistry and the David R. Bloom Center for Pharmacy, School of Pharmacy, Hebrew University of Jerusalem, Jerusalem 91120, Israel
| | | | | |
Collapse
|
129
|
MEGURO T, YAMATO I. New Structure Deformation Algorithm for Monte Carlo Simulation of Protein Folding. JOURNAL OF COMPUTER CHEMISTRY-JAPAN 2002. [DOI: 10.2477/jccj.1.9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/05/2022]
|
130
|
Abstract
Recent advances in massively parallel experimental and computational technologies are leading to radically new approaches to the early phases of the drug production pipeline. The revolution in DNA microarray technologies and the imminent emergence of its analogue for proteins, along with machine learning algorithms, promise rapid acceleration in the identification of potential drug targets, and in high-throughput screens for subpopulation-specific toxicity. Similarly, advances in structural genomics in conjunction with in vitro and in silico evolutionary methods will rapidly accelerate the number of lead drug candidates and substantially augment their target specificity. Taken collectively, these advances will usher in an era of predictive medicine, which will move medical practice from reactive therapy after disease onset, to proactive prevention.
Collapse
Affiliation(s)
- Zhiping Weng
- Biomedical Engineering Department and Bioinformatics Program, Boston University, Boston MA 02215, USA.
| | | |
Collapse
|
131
|
Abstract
The field of computational protein design is reaching its adolescence. Protein design algorithms have been applied to design or engineer proteins that fold, fold faster, catalyze, catalyze faster, signal, and adopt preferred conformational states. Further developments of scoring functions, sampling strategies, and optimization methods will expand the range of applicability of computational protein design to larger and more varied systems, with greater incidence of success. Developments in this field are beginning to have significant impact on biotechnology and chemical biology.
Collapse
Affiliation(s)
- C M Kraemer-Pecore
- The Pennsylvania State University, Department of Chemistry, Chandlee Laboratory, University Park, PA 16802, USA
| | | | | |
Collapse
|
132
|
Tian F, Valafar H, Prestegard JH. A dipolar coupling based strategy for simultaneous resonance assignment and structure determination of protein backbones. J Am Chem Soc 2001; 123:11791-6. [PMID: 11716736 DOI: 10.1021/ja011806h] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
A new approach for simultaneous protein backbone resonance assignment and structure determination by NMR is introduced. This approach relies on recent advances in high-resolution NMR spectroscopy that allow observation of anisotropic interactions, such as dipolar couplings, from proteins partially aligned in field ordered media. Residual dipolar couplings are used for both geometric information and a filter in the assembly of residues in a sequential manner. Experimental data were collected in less than one week on a small redox protein, rubredoxin, that was 15N enriched but not enriched above 1% natural abundance in 13C. Given the acceleration possible with partial 13C enrichment, the protocol described should provide a very rapid route to protein structure determination. This is critical for the structural genomics initiative where protein expression and structural determination in a high-throughput manner will be needed.
Collapse
Affiliation(s)
- F Tian
- Southeast Collaboratory for Structural Genomics, University of Georgia, Athens, Georgia 30602-4712, USA
| | | | | |
Collapse
|
133
|
Long SB, Hancock PJ, Kral AM, Hellinga HW, Beese LS. The crystal structure of human protein farnesyltransferase reveals the basis for inhibition by CaaX tetrapeptides and their mimetics. Proc Natl Acad Sci U S A 2001; 98:12948-53. [PMID: 11687658 PMCID: PMC60805 DOI: 10.1073/pnas.241407898] [Citation(s) in RCA: 88] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein farnesyltransferase (FTase) catalyzes the attachment of a farnesyl lipid group to the cysteine residue located in the C-terminal tetrapeptide of many essential signal transduction proteins, including members of the Ras superfamily. Farnesylation is essential both for normal functioning of these proteins, and for the transforming activity of oncogenic mutants. Consequently FTase is an important target for anti-cancer therapeutics. Several FTase inhibitors are currently undergoing clinical trials for cancer treatment. Here, we present the crystal structure of human FTase, as well as ternary complexes with the TKCVFM hexapeptide substrate, CVFM non-substrate tetrapeptide, and L-739,750 peptidomimetic with either farnesyl diphosphate (FPP), or a nonreactive analogue. These structures reveal the structural mechanism of FTase inhibition. Some CaaX tetrapeptide inhibitors are not farnesylated, and are more effective inhibitors than farnesylated CaaX tetrapeptides. CVFM and L-739,750 are not farnesylated, because these inhibitors bind in a conformation that is distinct from the TKCVFM hexapeptide substrate. This non-substrate binding mode is stabilized by an ion pair between the peptide N terminus and the alpha-phosphate of the FPP substrate. Conformational mapping calculations reveal the basis for the sequence specificity in the third position of the CaaX motif that determines whether a tetrapeptide is a substrate or non-substrate. The presence of beta-branched amino acids in this position prevents formation of the non-substrate conformation; all other aliphatic amino acids in this position are predicted to form the non-substrate conformation, provided their N terminus is available to bind to the FPP alpha-phosphate. These results may facilitate further development of FTase inhibitors.
Collapse
Affiliation(s)
- S B Long
- Department of Biochemistry, Duke University Medical Center, Durham, NC 27710, USA
| | | | | | | | | |
Collapse
|
134
|
Abstract
Current techniques for the prediction of side-chain conformations on a fixed backbone have an accuracy limit of about 1.0-1.5 A rmsd for core residues. We have carried out a detailed and systematic analysis of the factors that influence the prediction of side-chain conformation and, on this basis, have succeeded in extending the limits of side-chain prediction for core residues to about 0.7 A rmsd from native, and 94 % and 89 % of chi(1) and chi(1+2 ) dihedral angles correctly predicted to within 20 degrees of native, respectively. These results are obtained using a force-field that accounts for only van der Waals interactions and torsional potentials. Prediction accuracy is strongly dependent on the rotamer library used. That is, a complete and detailed rotamer library is essential. The greatest accuracy was obtained with an extensive rotamer library, containing over 7560 members, in which bond lengths and bond angles were taken from the database rather than simply assuming idealized values. Perhaps the most surprising finding is that the combinatorial problem normally associated with the prediction of the side-chain conformation does not appear to be important. This conclusion is based on the fact that the prediction of the conformation of a single side-chain with all others fixed in their native conformations is only slightly more accurate than the simultaneous prediction of all side-chain dihedral angles.
Collapse
Affiliation(s)
- Z Xiang
- Department of Biochemistry and Molecular Biophysics BB221, Columbia University, New York, NY 10032, USA
| | | |
Collapse
|
135
|
Current Awareness on Comparative and Functional Genomics. Comp Funct Genomics 2001. [PMCID: PMC2447222 DOI: 10.1002/cfg.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
|