101
|
Saven JG. Computational protein design: Advances in the design and redesign of biomolecular nanostructures. Curr Opin Colloid Interface Sci 2010; 15:13-17. [PMID: 21544231 DOI: 10.1016/j.cocis.2009.06.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Computational protein design facilitates the continued development of methods for the design of biomolecular structure, sequence and function. Recent applications include the design of novel protein sequences and structures, proteins incorporating nonbiological components, protein assemblies, soluble variants of membrane proteins, and proteins that modulate membrane function.
Collapse
Affiliation(s)
- Jeffery G Saven
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104, United States
| |
Collapse
|
102
|
McAllister SR, Floudas CA. An improved hybrid global optimization method for protein tertiary structure prediction. COMPUTATIONAL OPTIMIZATION AND APPLICATIONS 2010; 45:377-413. [PMID: 20357906 PMCID: PMC2847311 DOI: 10.1007/s10589-009-9277-y] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
First principles approaches to the protein structure prediction problem must search through an enormous conformational space to identify low-energy, near-native structures. In this paper, we describe the formulation of the tertiary structure prediction problem as a nonlinear constrained minimization problem, where the goal is to minimize the energy of a protein conformation subject to constraints on torsion angles and interatomic distances. The core of the proposed algorithm is a hybrid global optimization method that combines the benefits of the αBB deterministic global optimization approach with conformational space annealing. These global optimization techniques employ a local minimization strategy that combines torsion angle dynamics and rotamer optimization to identify and improve the selection of initial conformations and then applies a sequential quadratic programming approach to further minimize the energy of the protein conformations subject to constraints. The proposed algorithm demonstrates the ability to identify both lower energy protein structures, as well as larger ensembles of low-energy conformations.
Collapse
|
103
|
Abstract
As the field of protein structure prediction continues to expand at an
exponential rate, the bench-biologist might feel overwhelmed by the sheer
range of available applications. This review presents the three main
approaches in computational structure prediction from a
non-bioinformatician?s point of view and makes a selection of tools and
servers freely available. These tools are evaluated from several aspects,
such as number of citations, ease of usage and quality of the results.
Finally, the applications of models generated by computational structure
prediction are discussed.
Collapse
|
104
|
Subramani A, DiMaggio PA, Floudas CA. Selecting high quality protein structures from diverse conformational ensembles. Biophys J 2009; 97:1728-36. [PMID: 19751678 DOI: 10.1016/j.bpj.2009.06.046] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/13/2009] [Revised: 06/15/2009] [Accepted: 06/30/2009] [Indexed: 01/01/2023] Open
Abstract
Protein structure prediction encompasses two major challenges: 1), the generation of a large ensemble of high resolution structures for a given amino-acid sequence; and 2), the identification of the structure closest to the native structure for a blind prediction. In this article, we address the second challenge, by proposing what is, to our knowledge, a novel iterative traveling-salesman problem-based clustering method to identify the structures of a protein, in a given ensemble, which are closest to the native structure. The method consists of an iterative procedure, which aims at eliminating clusters of structures at each iteration, which are unlikely to be of similar fold to the native, based on a statistical analysis of cluster density and average spherical radius. The method, denoted as ICON, has been tested on four data sets: 1), 1400 proteins with high resolution decoys; 2), medium-to-low resolution decoys from Decoys 'R' Us; 3), medium-to-low resolution decoys from the first-principles approach, ASTRO-FOLD; and 4), selected targets from CASP8. The extensive tests demonstrate that ICON can identify high-quality structures in each ensemble, regardless of the resolution of conformers. In a total of 1454 proteins, with an average of 1051 conformers per protein, the conformers selected by ICON are, on an average, in the top 3.5% of the conformers in the ensemble.
Collapse
Affiliation(s)
- Ashwin Subramani
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey, USA
| | | | | |
Collapse
|
105
|
Jha AN, Ananthasuresh GK, Vishveshwara S. A search for energy minimized sequences of proteins. PLoS One 2009; 4:e6684. [PMID: 19690619 PMCID: PMC2724685 DOI: 10.1371/journal.pone.0006684] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2009] [Accepted: 07/23/2009] [Indexed: 11/21/2022] Open
Abstract
In this paper, we present numerical evidence that supports the notion of minimization in the sequence space of proteins for a target conformation. We use the conformations of the real proteins in the Protein Data Bank (PDB) and present computationally efficient methods to identify the sequences with minimum energy. We use edge-weighted connectivity graph for ranking the residue sites with reduced amino acid alphabet and then use continuous optimization to obtain the energy-minimizing sequences. Our methods enable the computation of a lower bound as well as a tight upper bound for the energy of a given conformation. We validate our results by using three different inter-residue energy matrices for five proteins from protein data bank (PDB), and by comparing our energy-minimizing sequences with 80 million diverse sequences that are generated based on different considerations in each case. When we submitted some of our chosen energy-minimizing sequences to Basic Local Alignment Search Tool (BLAST), we obtained some sequences from non-redundant protein sequence database that are similar to ours with an E-value of the order of 10-7. In summary, we conclude that proteins show a trend towards minimizing energy in the sequence space but do not seem to adopt the global energy-minimizing sequence. The reason for this could be either that the existing energy matrices are not able to accurately represent the inter-residue interactions in the context of the protein environment or that Nature does not push the optimization in the sequence space, once it is able to perform the function.
Collapse
Affiliation(s)
- Anupam Nath Jha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - G. K. Ananthasuresh
- Department of Mechanical Engineering, Indian Institute of Science, Bangalore, India
- * E-mail: (SV); (GKA)
| | - Saraswathi Vishveshwara
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
- * E-mail: (SV); (GKA)
| |
Collapse
|
106
|
Quirk S, Zhong S, Hernandez R. De novoidentification of binding sequences for antibody replacement molecules. Proteins 2009; 76:693-705. [DOI: 10.1002/prot.22382] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
107
|
Cohen M, Potapov V, Schreiber G. Four distances between pairs of amino acids provide a precise description of their interaction. PLoS Comput Biol 2009; 5:e1000470. [PMID: 19680437 PMCID: PMC2715887 DOI: 10.1371/journal.pcbi.1000470] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2009] [Accepted: 07/15/2009] [Indexed: 11/18/2022] Open
Abstract
The three-dimensional structures of proteins are stabilized by the interactions between amino acid residues. Here we report a method where four distances are calculated between any two side chains to provide an exact spatial definition of their bonds. The data were binned into a four-dimensional grid and compared to a random model, from which the preference for specific four-distances was calculated. A clear relation between the quality of the experimental data and the tightness of the distance distribution was observed, with crystal structure data providing far tighter distance distributions than NMR data. Since the four-distance data have higher information content than classical bond descriptions, we were able to identify many unique inter-residue features not found previously in proteins. For example, we found that the side chains of Arg, Glu, Val and Leu are not symmetrical in respect to the interactions of their head groups. The described method may be developed into a function, which computationally models accurately protein structures.
Collapse
Affiliation(s)
- Mati Cohen
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Vladimir Potapov
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| | - Gideon Schreiber
- Department of Biological Chemistry, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
108
|
Lippi M, Frasconi P. Prediction of protein beta-residue contacts by Markov logic networks with grounding-specific weights. ACTA ACUST UNITED AC 2009; 25:2326-33. [PMID: 19592394 DOI: 10.1093/bioinformatics/btp421] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Accurate prediction of contacts between beta-strand residues can significantly contribute towards ab initio prediction of the 3D structure of many proteins. Contacts in the same protein are highly interdependent. Therefore, significant improvements can be expected by applying statistical relational learners that overcome the usual machine learning assumption that examples are independent and identically distributed. Furthermore, the dependencies among beta-residue contacts are subject to strong regularities, many of which are known a priori. In this article, we take advantage of Markov logic, a statistical relational learning framework that is able to capture dependencies between contacts, and constrain the solution according to domain knowledge expressed by means of weighted rules in a logical language. RESULTS We introduce a novel hybrid architecture based on neural and Markov logic networks with grounding-specific weights. On a non-redundant dataset, our method achieves 44.9% F(1) measure, with 47.3% precision and 42.7% recall, which is significantly better (P < 0.01) than previously reported performance obtained by 2D recursive neural networks. Our approach also significantly improves the number of chains for which beta-strands are nearly perfectly paired (36% of the chains are predicted with F(1) >or= 70% on coarse map). It also outperforms more general contact predictors on recent CASP 2008 targets.
Collapse
Affiliation(s)
- Marco Lippi
- Dipartimento di Sistemi e Informatica, Università degli Studi di Firenze, Firenze, Italy.
| | | |
Collapse
|
109
|
Wu JC, Gardner DP, Ozer S, Gutell RR, Ren P. Correlation of RNA secondary structure statistics with thermodynamic stability and applications to folding. J Mol Biol 2009; 391:769-83. [PMID: 19540243 DOI: 10.1016/j.jmb.2009.06.036] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2009] [Revised: 06/05/2009] [Accepted: 06/12/2009] [Indexed: 11/15/2022]
Abstract
The accurate prediction of the secondary and tertiary structure of an RNA with different folding algorithms is dependent on several factors, including the energy functions. However, an RNA higher-order structure cannot be predicted accurately from its sequence based on a limited set of energy parameters. The inter- and intramolecular forces between this RNA and other small molecules and macromolecules, in addition to other factors in the cell such as pH, ionic strength, and temperature, influence the complex dynamics associated with transition of a single stranded RNA to its secondary and tertiary structure. Since all of the factors that affect the formation of an RNAs 3D structure cannot be determined experimentally, statistically derived potential energy has been used in the prediction of protein structure. In the current work, we evaluate the statistical free energy of various secondary structure motifs, including base-pair stacks, hairpin loops, and internal loops, using their statistical frequency obtained from the comparative analysis of more than 50,000 RNA sequences stored in the RNA Comparative Analysis Database (rCAD) at the Comparative RNA Web (CRW) Site. Statistical energy was computed from the structural statistics for several datasets. While the statistical energy for a base-pair stack correlates with experimentally derived free energy values, suggesting a Boltzmann-like distribution, variation is observed between different molecules and their location on the phylogenetic tree of life. Our statistical energy values calculated for several structural elements were utilized in the Mfold RNA-folding algorithm. The combined statistical energy values for base-pair stacks, hairpins and internal loop flanks result in a significant improvement in the accuracy of secondary structure prediction; the hairpin flanks contribute the most.
Collapse
Affiliation(s)
- Johnny C Wu
- Department of Biomedical Engineering, University of Texas at Austin, 78712-1062, USA
| | | | | | | | | |
Collapse
|
110
|
Björkholm P, Daniluk P, Kryshtafovych A, Fidelis K, Andersson R, Hvidsten TR. Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. ACTA ACUST UNITED AC 2009; 25:1264-70. [PMID: 19289446 DOI: 10.1093/bioinformatics/btp149] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. RESULTS We propose a novel hidden Markov model (HMM)-based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 x L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short-range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature. AVAILABILITY http://predictioncenter.org/Services/FragHMMent/.
Collapse
Affiliation(s)
- Patrik Björkholm
- The Linnaeus Centre for Bioinformatics, Uppsala University, Uppsala, Sweden
| | | | | | | | | | | |
Collapse
|
111
|
Rajgaria R, McAllister SR, Floudas CA. Towards accurate residue-residue hydrophobic contact prediction for alpha helical proteins via integer linear optimization. Proteins 2009; 74:929-47. [PMID: 18767158 DOI: 10.1002/prot.22202] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
A new optimization-based method is presented to predict the hydrophobic residue contacts in alpha-helical proteins. The proposed approach uses a high resolution distance dependent force field to calculate the interaction energy between different residues of a protein. The formulation predicts the hydrophobic contacts by minimizing the sum of these contact energies. These residue contacts are highly useful in narrowing down the conformational space searched by protein structure prediction algorithms. The proposed algorithm also offers the algorithmic advantage of producing a rank ordered list of the best contact sets. This model was tested on four independent alpha-helical protein test sets and was found to perform very well. The average accuracy of the predictions (separated by at least six residues) obtained using the presented method was approximately 66% for single domain proteins. The average true positive and false positive distances were also calculated for each protein test set and they are 8.87 and 14.67 A, respectively.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
112
|
Yuzlenko O, Kieć-Kononowicz K. Molecular modeling of A1 and A2A adenosine receptors: comparison of rhodopsin- and beta2-adrenergic-based homology models through the docking studies. J Comput Chem 2008; 30:14-32. [PMID: 18496794 DOI: 10.1002/jcc.21001] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
Adenosine receptors (ARs) are members of the superfamily of G protein-coupled receptors. The homology models of adenosine A1 and A2A receptors were constructed. The high-resolution X-ray structure of bovine rhodopsin and crystal structure of beta2-adrenergic receptor were used as templates. The binding sites of the A1 and A2A ARs were constructed by using data obtained from mutagenesis experiments as well as docking simulations of the respective AR antagonsists DPCPX and XAC. To compare rhodopsin- and beta2-adrenergic-based models, the binding mode of A1 (KW-3902, LUF-5437) and A2A (KW-6002, ZM-241385) ARs antagonists were also examined. The differences in the binding ability of both models were noted during the study. The beta2-adrenergic-based A2A AR model was much more capable to stabilize the ligand in the binding site cavity than the corresponding rhodopsin-based A2A AR model, however, such differences were not so clear in case of A1 AR models. It was suggested that for the A1 AR it is possible to use the crystal structure of rhodopsin as a template as well as beta2-adrenergic receptor, but for A2A AR, with the now available beta2-adrenergic receptor X-ray structure, docking studies should be avoided on the rhodopsin-based model. However, taking into account that the beta2AR shares about 31% of the residues with the AR in comparison to 21% in case of bRho, we suggest using beta2-adrenergic-based models for the A1 and A2A ARs for further in silico ligand screening also because of their generally better ability to stabilize ligands inside the binding pocket.
Collapse
Affiliation(s)
- Olga Yuzlenko
- Department of Technology and Biotechnology of Drugs, Medical College, Jagiellonian University, Kraków, Poland
| | | |
Collapse
|
113
|
Woodley SM, Catlow R. Crystal structure prediction from first principles. NATURE MATERIALS 2008; 7:937-946. [PMID: 19029928 DOI: 10.1038/nmat2321] [Citation(s) in RCA: 323] [Impact Index Per Article: 20.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]
Abstract
The prediction of structure at the atomic level is one of the most fundamental challenges in condensed matter science. Here we survey the current status of the field and consider recent developments in methodology, paying particular attention to approaches for surveying energy landscapes. We illustrate the current state of the art in this field with topical applications to inorganic, especially microporous solids, and to molecular crystals; we also look at applications to nanoparticulate structures. Finally, we consider future directions and challenges in the field.
Collapse
Affiliation(s)
- Scott M Woodley
- Department of Chemistry, University College London, Kathleen Lonsdale Building, Gower Street, London WC1E 6BT, UK.
| | | |
Collapse
|
114
|
Olofsson L, Söderberg P, Ankarloo J, Nicholls IA. Phage display screening in low dielectric media. J Mol Recognit 2008; 21:330-7. [PMID: 18654983 DOI: 10.1002/jmr.904] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
Here we report the first application of phage display screening in low dielectric media. Two series of phage clones with affinity for alpha-chymotrypsin (CT) were selected from a Ph.D.(TM)-C7C library, using either a buffer or acetonitrile in buffer (50%, v/v). The affinity of lysates, individual clones or selected cyclic peptides for the enzyme was studied by examining their influence on CT activity. Peptides displayed on phage selected in buffer provided significant protection from enzyme autolysis resulting in marked increase in CT activity (>100%). Phage selected in ACN provided some, albeit weak, protection from the detrimental influence on CT from ACN. In conclusion, the results demonstrate the potential for the application of phage display screening protocols to targets in media of low dielectricity.
Collapse
Affiliation(s)
- Linus Olofsson
- Bioorganic and Biophysical Chemistry Laboratory, School of Pure and Applied Natural Sciences, University of Kalmar, Kalmar, Sweden
| | | | | | | |
Collapse
|
115
|
Reconstruction of protein backbones from the BriX collection of canonical protein fragments. PLoS Comput Biol 2008; 4:e1000083. [PMID: 18483555 PMCID: PMC2367438 DOI: 10.1371/journal.pcbi.1000083] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2007] [Accepted: 04/07/2008] [Indexed: 12/23/2022] Open
Abstract
As modeling of changes in backbone conformation still lacks a computationally efficient solution, we developed a discretisation of the conformational states accessible to the protein backbone similar to the successful rotamer approach in side chains. The BriX fragment database, consisting of fragments from 4 to 14 residues long, was realized through identification of recurrent backbone fragments from a non-redundant set of high-resolution protein structures. BriX contains an alphabet of more than 1,000 frequently observed conformations per peptide length for 6 different variation levels. Analysis of the performance of BriX revealed an average structural coverage of protein structures of more than 99% within a root mean square distance (RMSD) of 1 Angstrom. Globally, we are able to reconstruct protein structures with an average accuracy of 0.48 Angstrom RMSD. As expected, regular structures are well covered, but, interestingly, many loop regions that appear irregular at first glance are also found to form a recurrent structural motif, albeit with lower frequency of occurrence than regular secondary structures. Larger loop regions could be completely reconstructed from smaller recurrent elements, between 4 and 8 residues long. Finally, we observed that a significant amount of short sequences tend to display strong structural ambiguity between alpha helix and extended conformations. When the sequence length increases, this so-called sequence plasticity is no longer observed, illustrating the context dependency of polypeptide structures. Large-scale DNA sequencing efforts produce large amounts of protein sequence data. However, in order to understand the function of a protein, its tertiary three-dimensional structure is required. Despite worldwide efforts in structural biology, experimental protein structures are determined at a significantly slower pace. As a result, computational methods for protein structure prediction receive significant attention. A large part of the structure prediction problem lies in the enormous size of the problem: proteins seem to occur in an infinite variety of shapes. Here, we propose that this huge complexity may be overcome by identifying recurrent protein fragments, which are frequently reused as building blocks to construct proteins that were hitherto thought to be unrelated. The BriX database is the outcome of identifying about 2,000 canonical shapes among 1,261 protein structures. We show any given protein can be reconstructed from this library of building blocks at a very high resolution, suggesting that the modelling of protein backbones may be greatly aided by our database.
Collapse
|
116
|
Rajgaria R, McAllister SR, Floudas CA. Distance dependent centroid to centroid force fields using high resolution decoys. Proteins 2008; 70:950-70. [PMID: 17847088 DOI: 10.1002/prot.21561] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Simplified force fields play an important role in protein structure prediction and de novo protein design by requiring less computational effort than detailed atomistic potentials. A side chain centroid based, distance dependent pairwise interaction potential has been developed. A linear programming based formulation was used in which non-native "decoy" conformers are forced to take a higher energy compared with the corresponding native structure. This model was trained on an enhanced and diverse protein set. High quality decoy structures were generated for approximately 1400 nonhomologous proteins using torsion angle dynamics along with restricted variations of the hydrophobic cores of the native structure. The resulting decoy set was used to train the model yielding two different side chain centroid based force fields that differ in the way distance dependence has been used to calculate energy parameters. These force fields were tested on an independent set of 148 test proteins with 500 decoy structures for each protein. The side chain centroid force fields were successful in correctly identifying approximately 86% native structures. The Z-scores produced by the proposed centroid-centroid distance dependent force fields improved compared with other distance dependent C(alpha)-C(alpha) or side chain based force fields.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
117
|
Chen E, Van Vranken V, Kliger DS. The Folding Kinetics of the SDS-Induced Molten Globule Form of Reduced Cytochrome c. Biochemistry 2008; 47:5450-9. [DOI: 10.1021/bi702452u] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Eefei Chen
- Department of Chemistry and Biochemistry, University of California, Santa Cruz, California 95064
| | - Vanessa Van Vranken
- Department of Chemistry and Biochemistry, University of California, Santa Cruz, California 95064
| | - David S. Kliger
- Department of Chemistry and Biochemistry, University of California, Santa Cruz, California 95064
| |
Collapse
|
118
|
Sadowski MI, Jones DT. Benchmarking template selection and model quality assessment for high-resolution comparative modeling. Proteins 2007; 69:476-85. [PMID: 17623860 DOI: 10.1002/prot.21531] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Comparative modeling is presently the most accurate method of protein structure prediction. Previous experiments have shown the selection of the correct template to be of paramount importance to the quality of the final model. We have derived a set of 732 targets for which a choice of ten or more templates exist with 30-80% sequence identity and used this set to compare a number of possible methods for template selection: BLAST, PSI-BLAST, profile-profile alignment, HHpred HMM-HMM comparison, global sequence alignment, and the use of a model quality assessment program (MQAP). In addition, we have investigated the question of whether any structurally defined subset of the sequence could be used to predict template quality better than overall sequence similarity. We find that template selection by BLAST is sufficient in 75% of cases but that there are examples in which improvement (global RMSD 0.5 A or more) could be made. No significant improvement is found for any of the more sophisticated sequence-based methods of template selection at high sequence identities. A subset of 118 targets extending to the lowest levels of sequence similarity was examined and the HHpred and MQAP methods were found to improve ranking when available templates had 35-40% maximum sequence identity. Structurally defined subsets in general are found to be less discriminative than overall sequence similarity, with the coil residue subset performing equivalently to sequence similarity. Finally, we demonstrate that if models are built and model quality is assessed in combination with the sequence-template sequence similarity that a extra 7% of "best" models can be found.
Collapse
Affiliation(s)
- M I Sadowski
- Bioinformatics Unit, Department of Computer Science, University College London, London WC1E 6BT, United Kingdom
| | | |
Collapse
|
119
|
Noy E, Tabakman T, Goldblum A. Constructing ensembles of flexible fragments in native proteins by iterative stochastic elimination is relevant to protein-protein interfaces. Proteins 2007; 68:702-11. [PMID: 17510963 DOI: 10.1002/prot.21437] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
We investigate the extent to which ensembles of flexible fragments (FF), generated by our loop conformational search method, include conformations that are near experimental and reflect conformational changes that these FFs undergo when binary protein-protein complexes are formed. Twenty-eight FFs, which are located in protein-protein interfaces and have different conformations in the bound structure (BS) and unbound structure (UbS) were extracted. The conformational space of these fragments in the BS and UbS was explored with our method which is based on the iterative stochastic elimination (ISE) algorithm. Conformational search of BSs generated bound ensembles and conformational search of UbSs produced unbound ensembles. ISE samples conformations near experimental (less than 1.05 A root mean square deviation, RMSD) for 51 out of the 56 examined fragments in the bound and unbound ensembles. In 14 out of the 28 unbound fragments, it also samples conformations within 1.05 A from the BS in the unbound ensemble. Sampling the bound conformation in the unbound ensemble demonstrates the potential biological relevance of the predicted ensemble. The 10 lowest energy conformations are the best choice for docking experiments, compared with any other 10 conformations of the ensembles. We conclude that generating conformational ensembles for FFs with ISE is relevant to FF conformations in the UbS and BS. Forming ensembles of the isolated proteins with our method prior to docking represents more comprehensively their inherent flexibility and is expected to improve docking experiments compared with results obtained by docking only UbSs.
Collapse
Affiliation(s)
- Efrat Noy
- Department of Medicinal Chemistry and the David R. Bloom Center for Pharmacy, School of Pharmacy, The Hebrew University of Jerusalem, Israel 91120
| | | | | |
Collapse
|
120
|
Abstract
This review presents the advances in protein structure prediction from the computational methods perspective. The approaches are classified into four major categories: comparative modeling, fold recognition, first principles methods that employ database information, and first principles methods without database information. Important advances along with current limitations and challenges are presented.
Collapse
Affiliation(s)
- C A Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA.
| |
Collapse
|
121
|
Fung HK, Floudas CA, Taylor MS, Zhang L, Morikis D. Toward full-sequence de novo protein design with flexible templates for human beta-defensin-2. Biophys J 2007; 94:584-99. [PMID: 17827237 PMCID: PMC2157230 DOI: 10.1529/biophysj.107.110627] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023] Open
Abstract
In this article, we introduce and apply our de novo protein design framework, which observes true backbone flexibility, to the redesign of human beta-defensin-2, a 41-residue cationic antimicrobial peptide of the innate immune system. The flexible design templates are generated using molecular dynamics simulations with both Generalized Born implicit solvation and explicit water molecules. These backbone templates were employed in addition to the x-ray crystal structure for designing human beta-defensin-2. The computational efficiency of our framework was demonstrated with the full-sequence design of the peptide with flexible backbone templates, corresponding to the mutation of all positions except the native cysteines.
Collapse
Affiliation(s)
- Ho Ki Fung
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey, USA
| | | | | | | | | |
Collapse
|
122
|
Kang SG, Saven JG. Computational protein design: structure, function and combinatorial diversity. Curr Opin Chem Biol 2007; 11:329-34. [PMID: 17524729 DOI: 10.1016/j.cbpa.2007.05.006] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Accepted: 05/10/2007] [Indexed: 11/26/2022]
Abstract
Computational protein design has blossomed with the development of methods for addressing the complexities involved in specifying the structure, sequence and function of proteins. Recent applications include the design of novel functional membrane and soluble proteins, proteins incorporating non-biological components and protein combinatorial libraries.
Collapse
Affiliation(s)
- Seung-gu Kang
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104, USA
| | | |
Collapse
|
123
|
Wu S, Skolnick J, Zhang Y. Ab initio modeling of small proteins by iterative TASSER simulations. BMC Biol 2007; 5:17. [PMID: 17488521 PMCID: PMC1878469 DOI: 10.1186/1741-7007-5-17] [Citation(s) in RCA: 341] [Impact Index Per Article: 20.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2006] [Accepted: 05/08/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins. RESULTS We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Calpha-root mean square deviation (RMSD) of 3.8A, with 6 of them having a Calpha-RMSD < 2.5A. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Calpha-RMSD < 2.5A. The average Calpha-RMSD of the I-TASSER models was 3.9A, whereas it was 5.9A using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Calpha-RMSD of 3.9A was obtained for the third benchmark, with seven cases having a Calpha-RMSD < 2.5A. CONCLUSION Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users http://zhang.bioinformatics.ku.edu/I-TASSER.
Collapse
Affiliation(s)
- Sitao Wu
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| | - Jeffrey Skolnick
- Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, 250 14th Street NW, Atlanta, GA 30318, USA
| | - Yang Zhang
- Center for Bioinformatics and Department of Molecular Bioscience, University of Kansas, 2030 Becker Dr, Lawrence, KS 66047, USA
| |
Collapse
|
124
|
McAllister SR, Mickus BE, Klepeis JL, Floudas CA. Novel approach for alpha-helical topology prediction in globular proteins: generation of interhelical restraints. Proteins 2007; 65:930-52. [PMID: 17029234 DOI: 10.1002/prot.21095] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The protein folding problem represents one of the most challenging problems in computational biology. Distance constraints and topology predictions can be highly useful for the folding problem in reducing the conformational space that must be searched by deterministic algorithms to find a protein structure of minimum conformational energy. We present a novel optimization framework for predicting topological contacts and generating interhelical distance restraints between hydrophobic residues in alpha-helical globular proteins. It should be emphasized that since the model does not make assumptions about the form of the helices, it is applicable to all alpha-helical proteins, including helices with kinks and irregular helices. This model aims at enhancing the ASTRO-FOLD protein folding approach of Klepeis and Floudas (Journal of Computational Chemistry 2003;24:191-208), which finds the structure of global minimum conformational energy via a constrained nonlinear optimization problem. The proposed topology prediction model was evaluated on 26 alpha-helical proteins ranging from 2 to 8 helices and 35 to 159 residues, and the best identified average interhelical distances corresponding to the predicted contacts fell below 11 A in all 26 of these systems. Given the positive results of applying the model to several protein systems, the importance of interhelical hydrophobic-to-hydrophobic contacts in determining the folding of alpha-helical globular proteins is highlighted.
Collapse
Affiliation(s)
- S R McAllister
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | | | |
Collapse
|
125
|
Liwo A, Khalili M, Czaplewski C, Kalinowski S, Ołdziej S, Wachucik K, Scheraga HA. Modification and optimization of the united-residue (UNRES) potential energy function for canonical simulations. I. Temperature dependence of the effective energy function and tests of the optimization method with single training proteins. J Phys Chem B 2007; 111:260-85. [PMID: 17201450 PMCID: PMC3236617 DOI: 10.1021/jp065380a] [Citation(s) in RCA: 159] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
We report the modification and parametrization of the united-residue (UNRES) force field for energy-based protein structure prediction and protein folding simulations. We tested the approach on three training proteins separately: 1E0L (beta), 1GAB (alpha), and 1E0G (alpha + beta). Heretofore, the UNRES force field had been designed and parametrized to locate native-like structures of proteins as global minima of their effective potential energy surfaces, which largely neglected the conformational entropy because decoys composed of only lowest-energy conformations were used to optimize the force field. Recently, we developed a mesoscopic dynamics procedure for UNRES and applied it with success to simulate protein folding pathways. However, the force field turned out to be largely biased toward -helical structures in canonical simulations because the conformational entropy had been neglected in the parametrization. We applied the hierarchical optimization method, developed in our earlier work, to optimize the force field; in this method, the conformational space of a training protein is divided into levels, each corresponding to a certain degree of native-likeness. The levels are ordered according to increasing native-likeness; level 0 corresponds to structures with no native-like elements, and the highest level corresponds to the fully native-like structures. The aim of optimization is to achieve the order of the free energies of levels, decreasing as their native-likeness increases. The procedure is iterative, and decoys of the training protein(s) generated with the energy function parameters of the preceding iteration are used to optimize the force field in a current iteration. We applied the multiplexing replica-exchange molecular dynamics (MREMD) method, recently implemented in UNRES, to generate decoys; with this modification, conformational entropy is taken into account. Moreover, we optimized the free-energy gaps between levels at temperatures corresponding to a predominance of folded or unfolded structures, as well as to structures at the putative folding-transition temperature, changing the sign of the gaps at the transition temperature. This enabled us to obtain force fields characterized by a single peak in the heat capacity at the transition temperature. Furthermore, we introduced temperature dependence to the UNRES force field; this is consistent with the fact that it is a free-energy and not a potential energy function. beta
Collapse
Affiliation(s)
- Adam Liwo
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Mey Khalili
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Cezary Czaplewski
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Sebastian Kalinowski
- Faculty of Chemistry, University of Gdańsk, Sobieskiego 18, 80-952 Gdańsk, Poland
| | - Stanisław Ołdziej
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| | - Katarzyna Wachucik
- Faculty of Chemistry, University of Gdańsk, Sobieskiego 18, 80-952 Gdańsk, Poland
| | - Harold A. Scheraga
- Baker Laboratory of Chemistry and Chemical Biology, Cornell University, Ithaca, N.Y., 14853-1301, U.S.A
| |
Collapse
|
126
|
Rajgaria R, McAllister SR, Floudas CA. A novel high resolution Calpha--Calpha distance dependent force field based on a high quality decoy set. Proteins 2007; 65:726-41. [PMID: 16981202 DOI: 10.1002/prot.21149] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
This work presents a novel C(alpha)--C(alpha) distance dependent force field which is successful in selecting native structures from an ensemble of high resolution near-native conformers. An enhanced and diverse protein set, along with an improved decoy generation technique, contributes to the effectiveness of this potential. High quality decoys were generated for 1489 nonhomologous proteins and used to train an optimization based linear programming formulation. The goal in developing a set of high resolution decoys was to develop a simple, distance-dependent force field that yields the native structure as the lowest energy structure and assigns higher energies to decoy structures that are quite similar as well as those that are less similar. The model also includes a set of physical constraints that were based on experimentally observed physical behavior of the amino acids. The force field was tested on two sets of test decoys not in the training set and was found to excel on all the metrics that are widely used to measure the effectiveness of a force field. The high resolution force field was successful in correctly identifying 113 native structures out of 150 test cases and the average rank obtained for this test was 1.87. All the high resolution structures (training and testing) used for this work are available online and can be downloaded from http://titan.princeton.edu/HRDecoys.
Collapse
Affiliation(s)
- R Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, USA
| | | | | |
Collapse
|
127
|
Zloh M, Shaunak S, Balan S, Brocchini S. Identification and insertion of 3-carbon bridges in protein disulfide bonds: a computational approach. Nat Protoc 2007; 2:1070-83. [PMID: 17545999 DOI: 10.1038/nprot.2007.119] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
More than 42,000 3D structures of proteins are available on the Internet. We have shown that the chemical insertion of a 3-carbon bridge across the native disulfide bond of a protein or peptide can enable the site-specific conjugation of PEG to the protein without a loss of its structure or function. For success, it is necessary to select an appropriate and accessible disulfide bond in the protein for this chemical modification. We describe how to use public protein databases and molecular modeling programs to select a protein rationally and to identify the optimum disulfide bond for experimental studies. Our computational approach can substantially reduce the time required for the laboratory-based chemical modification. Identification of solvent-accessible disulfides using published structural information takes approximately 2 h. Predicting the structural effects of the disulfide-based modification can take 3 weeks.
Collapse
Affiliation(s)
- Mire Zloh
- Department of Pharmaceutical and Biological Chemistry, University of London, 29/39 Brunswick Square, London WC1N 1AX, UK.
| | | | | | | |
Collapse
|
128
|
Zhu Y. Mixed-Integer Linear Programming Algorithm for a Computational Protein Design Problem. Ind Eng Chem Res 2006. [DOI: 10.1021/ie0605985] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Yushan Zhu
- Department of Chemical Engineering, Tsinghua University, Beijing 100084, P.R.China
| |
Collapse
|