51
|
Currin A, Swainston N, Day PJ, Kell DB. Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently. Chem Soc Rev 2015; 44:1172-239. [PMID: 25503938 PMCID: PMC4349129 DOI: 10.1039/c4cs00351a] [Citation(s) in RCA: 256] [Impact Index Per Article: 28.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2014] [Indexed: 12/21/2022]
Abstract
The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biology, whereby increasingly large sequences of DNA can be synthesised de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the number of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modelling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), especially with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a number of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modelling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biology, offers scope for the development of novel biocatalysts that are both highly active and robust.
Collapse
Affiliation(s)
- Andrew Currin
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| | - Neil Swainston
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- School of Computer Science , The University of Manchester , Manchester M13 9PL , UK
| | - Philip J. Day
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
- Faculty of Medical and Human Sciences , The University of Manchester , Manchester M13 9PT , UK
| | - Douglas B. Kell
- Manchester Institute of Biotechnology , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK . ; http://dbkgroup.org/; @dbkell ; Tel: +44 (0)161 306 4492
- School of Chemistry , The University of Manchester , Manchester M13 9PL , UK
- Centre for Synthetic Biology of Fine and Speciality Chemicals (SYNBIOCHEM) , The University of Manchester , 131, Princess St , Manchester M1 7DN , UK
| |
Collapse
|
52
|
Secondary and Tertiary Structure Prediction of Proteins: A Bioinformatic Approach. COMPLEX SYSTEM MODELLING AND CONTROL THROUGH INTELLIGENT SOFT COMPUTATIONS 2015. [DOI: 10.1007/978-3-319-12883-2_19] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
|
53
|
Faraggi E, Kloczkowski A. GENN: a GEneral Neural Network for learning tabulated data with examples from protein structure prediction. Methods Mol Biol 2015; 1260:165-78. [PMID: 25502381 PMCID: PMC6930076 DOI: 10.1007/978-1-4939-2239-0_10] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/18/2023]
Abstract
We present a GEneral Neural Network (GENN) for learning trends from existing data and making predictions of unknown information. The main novelty of GENN is in its generality, simplicity of use, and its specific handling of windowed input/output. Its main strength is its efficient handling of the input data, enabling learning from large datasets. GENN is built on a two-layered neural network and has the option to use separate inputs-output pairs or window-based data using data structures to efficiently represent input-output pairs. The program was tested on predicting the accessible surface area of globular proteins, scoring proteins according to similarity to native, predicting protein disorder, and has performed remarkably well. In this paper we describe the program and its use. Specifically, we give as an example the construction of a similarity to native protein scoring function that was constructed using GENN. The source code and Linux executables for GENN are available from Research and Information Systems at http://mamiris.com and from the Battelle Center for Mathematical Medicine at http://mathmed.org. Bugs and problems with the GENN program should be reported to EF.
Collapse
Affiliation(s)
- Eshel Faraggi
- Department of Biochemistry and Molecular Biology, Indiana University School of Medicine, Indianapolis, Indiana 46202, USA; Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio 43215, USA; and Physics Division, Research and Information Systems, LLC, Carmel, Indiana, 46032, USA, phone: 317-332-0368
| | - Andrzej Kloczkowski
- Andrzej Kloczkowski Battelle Center for Mathematical Medicine, Nationwide Children’s Hospital, Columbus, Ohio 43215, USA; and Department of Pediatrics, The Ohio State University, Columbus, Ohio 43215, USA
| |
Collapse
|
54
|
Spencer M, Eickholt J, Cheng J. A Deep Learning Network Approach to ab initio Protein Secondary Structure Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:103-12. [PMID: 25750595 PMCID: PMC4348072 DOI: 10.1109/tcbb.2014.2343960] [Citation(s) in RCA: 138] [Impact Index Per Article: 15.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/12/2023]
Abstract
Ab initio protein secondary structure (SS) predictions are utilized to generate tertiary structure predictions, which are increasingly demanded due to the rapid discovery of proteins. Although recent developments have slightly exceeded previous methods of SS prediction, accuracy has stagnated around 80 percent and many wonder if prediction cannot be advanced beyond this ceiling. Disciplines that have traditionally employed neural networks are experimenting with novel deep learning techniques in attempts to stimulate progress. Since neural networks have historically played an important role in SS prediction, we wanted to determine whether deep learning could contribute to the advancement of this field as well. We developed an SS predictor that makes use of the position-specific scoring matrix generated by PSI-BLAST and deep learning network architectures, which we call DNSS. Graphical processing units and CUDA software optimize the deep network architecture and efficiently train the deep networks. Optimal parameters for the training process were determined, and a workflow comprising three separately trained deep networks was constructed in order to make refined predictions. This deep learning network approach was used to predict SS for a fully independent test dataset of 198 proteins, achieving a Q3 accuracy of 80.7 percent and a Sov accuracy of 74.2 percent.
Collapse
Affiliation(s)
- Matt Spencer
- Informatics Institute, University of Missouri, Columbia, MO 65211.
| | - Jesse Eickholt
- Department of Computer Science, Central Michigan University, Mount Pleasant, MI 48859.
| | - Jianlin Cheng
- Department of Computer Science, University of Missouri, Columbia, MO 65211.
| |
Collapse
|
55
|
Three-dimensional protein structure prediction: Methods and computational strategies. Comput Biol Chem 2014; 53PB:251-276. [DOI: 10.1016/j.compbiolchem.2014.10.001] [Citation(s) in RCA: 121] [Impact Index Per Article: 12.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Revised: 10/03/2014] [Accepted: 10/07/2014] [Indexed: 01/01/2023]
|
56
|
Shabestari MH, Wolfs CJAM, Spruijt RB, van Amerongen H, Huber M. Exploring the structure of the 100 amino-acid residue long N-terminus of the plant antenna protein CP29. Biophys J 2014; 106:1349-58. [PMID: 24655510 DOI: 10.1016/j.bpj.2013.11.4506] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2013] [Revised: 11/14/2013] [Accepted: 11/27/2013] [Indexed: 12/01/2022] Open
Abstract
The structure of the unusually long (∼100 amino-acid residues) N-terminal domain of the light-harvesting protein CP29 of plants is not defined in the crystal structure of this membrane protein. We studied the N-terminus using two electron paramagnetic resonance (EPR) approaches: the rotational diffusion of spin labels at 55 residues with continuous-wave EPR, and three sets of distances with a pulsed EPR method. The N-terminus is relatively structured. Five regions that differ considerably in their dynamics are identified. Two regions have low rotational diffusion, one of which shows α-helical character suggesting contact with the protein surface. This immobile part is flanked by two highly dynamic, unstructured regions (loops) that cover residues 10-22 and 82-91. These loops may be important for the interaction with other light-harvesting proteins. The region around residue 4 also has low rotational diffusion, presumably because it attaches noncovalently to the protein. This section is close to a phosphorylation site (Thr-6) in related proteins, such as those encoded by the Lhcb4.2 gene. Phosphorylation might influence the interaction with other antenna complexes, thereby regulating the supramolecular organization in the thylakoid membrane.
Collapse
Affiliation(s)
| | - Cor J A M Wolfs
- Laboratory of Biophysics, Wageningen University, Wageningen, The Netherlands
| | - Ruud B Spruijt
- Laboratory of Biophysics, Wageningen University, Wageningen, The Netherlands
| | | | - Martina Huber
- Department of Molecular Physics, Leiden University, Leiden, The Netherlands.
| |
Collapse
|
57
|
Peterson LX, Kang X, Kihara D. Assessment of protein side-chain conformation prediction methods in different residue environments. Proteins 2014; 82:1971-84. [PMID: 24619909 PMCID: PMC5007623 DOI: 10.1002/prot.24552] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Revised: 03/02/2014] [Accepted: 03/07/2014] [Indexed: 11/09/2022]
Abstract
Computational prediction of side-chain conformation is an important component of protein structure prediction. Accurate side-chain prediction is crucial for practical applications of protein structure models that need atomic-detailed resolution such as protein and ligand design. We evaluated the accuracy of eight side-chain prediction methods in reproducing the side-chain conformations of experimentally solved structures deposited to the Protein Data Bank. Prediction accuracy was evaluated for a total of four different structural environments (buried, surface, interface, and membrane-spanning) in three different protein types (monomeric, multimeric, and membrane). Overall, the highest accuracy was observed for buried residues in monomeric and multimeric proteins. Notably, side-chains at protein interfaces and membrane-spanning regions were better predicted than surface residues even though the methods did not all use multimeric and membrane proteins for training. Thus, we conclude that the current methods are as practically useful for modeling protein docking interfaces and membrane-spanning regions as for modeling monomers.
Collapse
Affiliation(s)
- Lenna X. Peterson
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47907, USA
| | - Xuejiao Kang
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| | - Daisuke Kihara
- Department of Biological Sciences, Purdue University, West Lafayette IN, 47907, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, 47907, USA
| |
Collapse
|
58
|
Hsu PJ, Cheong SA, Lai SK. Precursory signatures of protein folding/unfolding: From time series correlation analysis to atomistic mechanisms. J Chem Phys 2014; 140:204905. [DOI: 10.1063/1.4875802] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Affiliation(s)
- P J Hsu
- Complex Liquids Laboratory, Department of Physics, National Central University, Chungli 320 Taiwan
| | - S A Cheong
- Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, 21 Nanyang Link, Singapore 637371, Republic of Singapore
| | - S K Lai
- Complex Liquids Laboratory, Department of Physics, National Central University, Chungli 320 Taiwan
| |
Collapse
|
59
|
Karpel RL. The illusive search for the lowest free energy state of globular proteins and RNAs. DNA Repair (Amst) 2014; 21:158-62. [PMID: 24846762 DOI: 10.1016/j.dnarep.2014.04.014] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2014] [Revised: 04/24/2014] [Accepted: 04/26/2014] [Indexed: 10/25/2022]
Abstract
As a consequence of the one-dimensional storage and transfer of genetic information, DNA→RNA→protein, the process by which globular proteins and RNAs achieve their three-dimensional structure involves folding of a linear chain. The folding process itself could create massive activation barriers that prevent the attainment of many stable protein and RNA structures. We consider several kinds of energy barriers inherent in folding that might serve as kinetic constraints to achieving the lowest energy state. Alternative approaches to forming 3D structure, where a substantial number of weak interactions would be created prior to the formation of all the peptide (or phosphodiester) bonds, might not be subjected to such high barriers. This could lead to unique 3D conformational states, potentially more stable than "native" proteins and RNAs, with new functionalities.
Collapse
Affiliation(s)
- Richard L Karpel
- Department of Chemistry and Biochemistry, University of Maryland Baltimore County, 1000 Hilltop Circle, Baltimore, MD 21250, United States.
| |
Collapse
|
60
|
Yu F, Cangelosi VM, Zastrow ML, Tegoni M, Plegaria JS, Tebo AG, Mocny CS, Ruckthong L, Qayyum H, Pecoraro VL. Protein design: toward functional metalloenzymes. Chem Rev 2014; 114:3495-578. [PMID: 24661096 PMCID: PMC4300145 DOI: 10.1021/cr400458x] [Citation(s) in RCA: 332] [Impact Index Per Article: 33.2] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Fangting Yu
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | | | | | | | | | - Alison G. Tebo
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | | | - Leela Ruckthong
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Hira Qayyum
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | | |
Collapse
|
61
|
Feng Y, Lin H, Luo L. Prediction of protein secondary structure using feature selection and analysis approach. Acta Biotheor 2014; 62:1-14. [PMID: 24052343 DOI: 10.1007/s10441-013-9203-7] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2012] [Accepted: 08/24/2013] [Indexed: 01/09/2023]
Abstract
The prediction of the secondary structure of a protein from its amino acid sequence is an important step towards the prediction of its three-dimensional structure. However, the accuracy of ab initio secondary structure prediction from sequence is about 80% currently, which is still far from satisfactory. In this study, we proposed a novel method that uses binomial distribution to optimize tetrapeptide structural words and increment of diversity with quadratic discriminant to perform prediction for protein three-state secondary structure. A benchmark dataset including 2,640 proteins with sequence identity of less than 25% was used to train and test the proposed method. The results indicate that overall accuracy of 87.8% was achieved in secondary structure prediction by using ten-fold cross-validation. Moreover, the accuracy of predicted secondary structures ranges from 84 to 89% at the level of residue. These results suggest that the feature selection technique can detect the optimized tetrapeptide structural words which affect the accuracy of predicted secondary structures.
Collapse
|
62
|
Yousif RH, Khairudin NBA. Homology Modeling of Human Sweet Taste Receptors: T1R2-T1R3. ACTA ACUST UNITED AC 2014. [DOI: 10.12720/jomb.3.2.84-86] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/03/2022]
|
63
|
Khoury GA, Thompson JP, Smadbeck J, Kieslich CA, Floudas CA. Forcefield_PTM: Ab Initio Charge and AMBER Forcefield Parameters for Frequently Occurring Post-Translational Modifications. J Chem Theory Comput 2013; 9:5653-5674. [PMID: 24489522 PMCID: PMC3904396 DOI: 10.1021/ct400556v] [Citation(s) in RCA: 79] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
In this work, we introduce Forcefield_PTM, a set of AMBER forcefield parameters consistent with ff03 for 32 common post-translational modifications. Partial charges were calculated through ab initio calculations and a two-stage RESP-fitting procedure in an ether-like implicit solvent environment. The charges were found to be generally consistent with others previously reported for phosphorylated amino acids, and trimethyllysine, using different parameterization methods. Pairs of modified and their corresponding unmodified structures were curated from the PDB for both single and multiple modifications. Background structural similarity was assessed in the context of secondary and tertiary structures from the global dataset. Next, the charges derived for Forcefield_PTM were tested on a macroscopic scale using unrestrained all-atom Langevin molecular dynamics simulations in AMBER for 34 (17 pairs of modified/unmodified) systems in implicit solvent. Assessment was performed in the context of secondary structure preservation, stability in energies, and correlations between the modified and unmodified structure trajectories on the aggregate. As an illustration of their utility, the parameters were used to compare the structural stability of the phosphorylated and dephosphorylated forms of OdhI. Microscopic comparisons between quantum and AMBER single point energies along key χ torsions on several PTMs were performed and corrections to improve their agreement in terms of mean squared errors and squared correlation coefficients were parameterized. This forcefield for post-translational modifications in condensed-phase simulations can be applied to a number of biologically relevant and timely applications including protein structure prediction, protein and peptide design, docking, and to study the effect of PTMs on folding and dynamics. We make the derived parameters and an associated interactive webtool capable of performing post-translational modifications on proteins using Forcefield_PTM available at http://selene.princeton.edu/FFPTM.
Collapse
Affiliation(s)
- George A. Khoury
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | - Jeff P. Thompson
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | - James Smadbeck
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | - Chris A. Kieslich
- Department of Chemical and Biological Engineering, Princeton, NJ, USA
| | | |
Collapse
|
64
|
Khoury GA, Tamamis P, Pinnaduwage N, Smadbeck J, Kieslich CA, Floudas CA. Princeton_TIGRESS: protein geometry refinement using simulations and support vector machines. Proteins 2013; 82:794-814. [PMID: 24174311 DOI: 10.1002/prot.24459] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Revised: 10/18/2013] [Accepted: 10/22/2013] [Indexed: 12/30/2022]
Abstract
Protein structure refinement aims to perform a set of operations given a predicted structure to improve model quality and accuracy with respect to the native in a blind fashion. Despite the numerous computational approaches to the protein refinement problem reported in the previous three CASPs, an overwhelming majority of methods degrade models rather than improve them. We initially developed a method tested using blind predictions during CASP10 which was officially ranked in 5th place among all methods in the refinement category. Here, we present Princeton_TIGRESS, which when benchmarked on all CASP 7,8,9, and 10 refinement targets, simultaneously increased GDT_TS 76% of the time with an average improvement of 0.83 GDT_TS points per structure. The method was additionally benchmarked on models produced by top performing three-dimensional structure prediction servers during CASP10. The robustness of the Princeton_TIGRESS protocol was also tested for different random seeds. We make the Princeton_TIGRESS refinement protocol freely available as a web server at http://atlas.princeton.edu/refinement. Using this protocol, one can consistently refine a prediction to help bridge the gap between a predicted structure and the actual native structure.
Collapse
Affiliation(s)
- George A Khoury
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey, 08540
| | | | | | | | | | | |
Collapse
|
65
|
|
66
|
Mitra P, Shultis D, Brender JR, Czajka J, Marsh D, Gray F, Cierpicki T, Zhang Y. An evolution-based approach to De Novo protein design and case study on Mycobacterium tuberculosis. PLoS Comput Biol 2013; 9:e1003298. [PMID: 24204234 PMCID: PMC3812052 DOI: 10.1371/journal.pcbi.1003298] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2012] [Accepted: 09/09/2013] [Indexed: 01/31/2023] Open
Abstract
Computational protein design is a reverse procedure of protein folding and structure prediction, where constructing structures from evolutionarily related proteins has been demonstrated to be the most reliable method for protein 3-dimensional structure prediction. Following this spirit, we developed a novel method to design new protein sequences based on evolutionarily related protein families. For a given target structure, a set of proteins having similar fold are identified from the PDB library by structural alignments. A structural profile is then constructed from the protein templates and used to guide the conformational search of amino acid sequence space, where physicochemical packing is accommodated by single-sequence based solvation, torsion angle, and secondary structure predictions. The method was tested on a computational folding experiment based on a large set of 87 protein structures covering different fold classes, which showed that the evolution-based design significantly enhances the foldability and biological functionality of the designed sequences compared to the traditional physics-based force field methods. Without using homologous proteins, the designed sequences can be folded with an average root-mean-square-deviation of 2.1 Å to the target. As a case study, the method is extended to redesign all 243 structurally resolved proteins in the pathogenic bacteria Mycobacterium tuberculosis, which is the second leading cause of death from infectious disease. On a smaller scale, five sequences were randomly selected from the design pool and subjected to experimental validation. The results showed that all the designed proteins are soluble with distinct secondary structure and three have well ordered tertiary structure, as demonstrated by circular dichroism and NMR spectroscopy. Together, these results demonstrate a new avenue in computational protein design that uses knowledge of evolutionary conservation from protein structural families to engineer new protein molecules of improved fold stability and biological functionality. The goal of computational protein design is to create new protein sequences of desirable structure and biological function. Most protein design methods are developed to search for sequences with the lowest free-energy based on physics-based force fields following Anfinsen's thermodynamic hypothesis. A major obstacle of such approaches is the inaccuracy of the force-field design, which cannot accurately describe atomic interactions or correctly recognize protein folds. We propose a novel method which uses evolutionary information, in the form of sequence profiles from structure families, to guide the sequence design. Since sequence profiles are generally more accurate than physics-based potentials in protein fold recognition, a unique advantage lies on that it targets the design procedure to a family of protein sequence profiles to enhance the robustness of designed sequences. The method was tested on 87 proteins and the designed sequences can be folded by I-TASSER to models with an average RMSD 2.1 Å. As a case study of large-scale application, the method is extended to redesign all structurally resolved proteins in the human pathogenic bacteria, Mycobacterium tuberculosis. Five sequences varying in fold and sizes were characterized by circular dichroism and NMR spectroscopy experiments and three were shown to have ordered tertiary structure.
Collapse
Affiliation(s)
- Pralay Mitra
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - David Shultis
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jeffrey R. Brender
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Jeff Czajka
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - David Marsh
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Felicia Gray
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Tomasz Cierpicki
- Department of Pathology, University of Michigan, Ann Arbor, Michigan, United States of America
| | - Yang Zhang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America
- Department of Biological Chemistry, University of Michigan, Ann Arbor, Michigan, United States of America
- * E-mail:
| |
Collapse
|
67
|
In silico determination and validation of baumannii acinetobactin utilization a structure and ligand binding site. BIOMED RESEARCH INTERNATIONAL 2013; 2013:172784. [PMID: 24106696 PMCID: PMC3780550 DOI: 10.1155/2013/172784] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/27/2013] [Revised: 07/21/2013] [Accepted: 07/31/2013] [Indexed: 01/21/2023]
Abstract
Acinetobacter baumannii is a deadly nosocomial pathogen. Iron is an essential element for the pathogen. Under iron-restricted conditions, the bacterium expresses iron-regulated outer membrane proteins (IROMPs). Baumannii acinetobactin utilization (BauA) is the most important member of IROMPs in A. baumannii. Determination of its tertiary structure could help deduction of its functions and its interactions with ligands. The present study unveils BauA 3D structure via in silico approaches. Apart from ab initio, other rational methods such as homology modeling and threading were invoked to achieve the purpose. For homology modeling, BLAST was run on the sequence in order to find the best template. The template was then served to model the 3D structure. All the models built were evaluated qualitatively. The best model predicted by LOMETS was selected for analyses. Refinement of 3D structure as well as determination of its clefts and ligand binding sites was carried out on the structure. In contrast to the typical trimeric arrangement found in porins, BauA is monomeric. The barrel is formed by 22 antiparallel transmembrane β -strands. There are short periplasmic turns and longer surface-located loops. An N-terminal domain referred to either as the cork, the plug, or the hatch domain occludes the β -barrel.
Collapse
|
68
|
MOIRAE: A computational strategy to extract and represent structural information from experimental protein templates. Soft comput 2013. [DOI: 10.1007/s00500-013-1087-6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
69
|
Bhattacharya D, Cheng J. i3Drefine software for protein 3D structure refinement and its assessment in CASP10. PLoS One 2013; 8:e69648. [PMID: 23894517 PMCID: PMC3716612 DOI: 10.1371/journal.pone.0069648] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2013] [Accepted: 06/13/2013] [Indexed: 12/25/2022] Open
Abstract
Protein structure refinement refers to the process of improving the qualities of protein structures during structure modeling processes to bring them closer to their native states. Structure refinement has been drawing increasing attention in the community-wide Critical Assessment of techniques for Protein Structure prediction (CASP) experiments since its addition in 8th CASP experiment. During the 9th and recently concluded 10th CASP experiments, a consistent growth in number of refinement targets and participating groups has been witnessed. Yet, protein structure refinement still remains a largely unsolved problem with majority of participating groups in CASP refinement category failed to consistently improve the quality of structures issued for refinement. In order to alleviate this need, we developed a completely automated and computationally efficient protein 3D structure refinement method, i3Drefine, based on an iterative and highly convergent energy minimization algorithm with a powerful all-atom composite physics and knowledge-based force fields and hydrogen bonding (HB) network optimization technique. In the recent community-wide blind experiment, CASP10, i3Drefine (as ‘MULTICOM-CONSTRUCT’) was ranked as the best method in the server section as per the official assessment of CASP10 experiment. Here we provide the community with free access to i3Drefine software and systematically analyse the performance of i3Drefine in strict blind mode on the refinement targets issued in CASP10 refinement category and compare with other state-of-the-art refinement methods participating in CASP10. Our analysis demonstrates that i3Drefine is only fully-automated server participating in CASP10 exhibiting consistent improvement over the initial structures in both global and local structural quality metrics. Executable version of i3Drefine is freely available at http://protein.rnet.missouri.edu/i3drefine/.
Collapse
Affiliation(s)
- Debswapna Bhattacharya
- Department of Computer Science, University of Missouri, Columbia, Missouri, United States of America
| | - Jianlin Cheng
- Department of Computer Science, Informatics Institute, Bond Life Science Center, University of Missouri, Columbia, Missouri, United States of America
- * E-mail:
| |
Collapse
|
70
|
Bagaria A, Jaravine V, Güntert P. Estimating structure quality trends in the Protein Data Bank by equivalent resolution. Comput Biol Chem 2013; 46:8-15. [PMID: 23751279 DOI: 10.1016/j.compbiolchem.2013.04.004] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2013] [Accepted: 04/29/2013] [Indexed: 01/01/2023]
Abstract
The quality of protein structures obtained by different experimental and ab-initio calculation methods varies considerably. The methods have been evolving over time by improving both experimental designs and computational techniques, and since the primary aim of these developments is the procurement of reliable and high-quality data, better techniques resulted on average in an evolution toward higher quality structures in the Protein Data Bank (PDB). Each method leaves a specific quantitative and qualitative "trace" in the PDB entry. Certain information relevant to one method (e.g. dynamics for NMR) may be lacking for another method. Furthermore, some standard measures of quality for one method cannot be calculated for other experimental methods, e.g. crystal resolution or NMR bundle RMSD. Consequently, structures are classified in the PDB by the method used. Here we introduce a method to estimate a measure of equivalent X-ray resolution (e-resolution), expressed in units of Å, to assess the quality of any type of monomeric, single-chain protein structure, irrespective of the experimental structure determination method. We showed and compared the trends in the quality of structures in the Protein Data Bank over the last two decades for five different experimental techniques, excluding theoretical structure predictions. We observed that as new methods are introduced, they undergo a rapid method development evolution: within several years the e-resolution score becomes similar for structures obtained from the five methods and they improve from initially poor performance to acceptable quality, comparable with previously established methods, the performance of which is essentially stable.
Collapse
Affiliation(s)
- Anurag Bagaria
- Institute of Biophysical Chemistry, Center for Biomolecular Magnetic Resonance, Goethe University Frankfurt am Main, 60438 Frankfurt am Main, Germany.
| | | | | |
Collapse
|
71
|
Affiliation(s)
- Rachel Kolodny
- Department of Computer Science, University of Haifa, Haifa 31905, Israel;
| | - Leonid Pereyaslavets
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| | | | - Michael Levitt
- Department of Structural Biology, Stanford University, Stanford, California 94305; ,
| |
Collapse
|
72
|
Genheden S. Are homology models sufficiently good for free-energy simulations? J Chem Inf Model 2012; 52:3013-21. [PMID: 23113602 DOI: 10.1021/ci300349s] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023]
Abstract
In this paper, I evaluate the usefulness of protein homology models in rigorous free-energy simulations to determine ligand affinities. Two templates were used to create models of the factor Xa protein and one template was used for dihydrofolate reductase from Plasmodium falciparum. Then, the relative free energies for several pairs of ligands were estimated using thermodynamic integration with the homology models as starting point of the simulation. These binding affinities were compared to affinities obtained when using published crystal structures as starting point of the simulations. Encouragingly, the differences between the affinities obtained when starting from either homology models or crystal structure were not statistical significant for a majority of the considered pairs of ligands. Differences between 1 and 2 kJ/mol were observed for the dihydrofolate reductase ligands and differences between 0 and 8 kJ/mol were observed for the factor Xa ligands. The largest difference for factor Xa was caused by an erroneous modeling of a loop region close to two of the ligands, and it was only observed when using one of the templates. Therefore, it is advisible to always use more than one template when creating homology models if they should be used in free-energy simulations.
Collapse
Affiliation(s)
- Samuel Genheden
- Division of Theoretical Chemistry, Department of Chemistry, Lund University, P.O. Box 124, SE-221 00 Lund, Sweden.
| |
Collapse
|
73
|
Matthies MC, Bienert S, Torda AE. Dynamics in Sequence Space for RNA Secondary Structure Design. J Chem Theory Comput 2012; 8:3663-70. [PMID: 26593011 DOI: 10.1021/ct300267j] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
We have implemented a method for the design of RNA sequences that should fold to arbitrary secondary structures. A popular energy model allows one to take the derivative with respect to composition, which can then be interpreted as a force and used for Newtonian dynamics in sequence space. Combined with a negative design term, one can rapidly sample sequences which are compatible with a desired secondary structure via simulated annealing. Results for 360 structures were compared with those from another nucleic acid design program using measures such as the probability of the target structure and an ensemble-weighted distance to the target structure.
Collapse
Affiliation(s)
- Marco C Matthies
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| | - Stefan Bienert
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany.,Biozentrum, University of Basel, Klingelbergstr. 50/70, 4056 Basel, Switzerland
| | - Andrew E Torda
- Centre for Bioinformatics, University of Hamburg, Bundesstr. 43, 20146 Hamburg, Germany
| |
Collapse
|
74
|
Kulp DW, Subramaniam S, Donald JE, Hannigan BT, Mueller BK, Grigoryan G, Senes A. Structural informatics, modeling, and design with an open-source Molecular Software Library (MSL). J Comput Chem 2012; 33:1645-61. [PMID: 22565567 PMCID: PMC3432414 DOI: 10.1002/jcc.22968] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/19/2011] [Revised: 02/16/2012] [Accepted: 03/02/2012] [Indexed: 01/22/2023]
Abstract
We present the Molecular Software Library (MSL), a C++ library for molecular modeling. MSL is a set of tools that supports a large variety of algorithms for the design, modeling, and analysis of macromolecules. Among the main features supported by the library are methods for applying geometric transformations and alignments, the implementation of a rich set of energy functions, side chain optimization, backbone manipulation, calculation of solvent accessible surface area, and other tools. MSL has a number of unique features, such as the ability of storing alternative atomic coordinates (for modeling) and multiple amino acid identities at the same backbone position (for design). It has a straightforward mechanism for extending its energy functions and can work with any type of molecules. Although the code base is large, MSL was created with ease of developing in mind. It allows the rapid implementation of simple tasks while fully supporting the creation of complex applications. Some of the potentialities of the software are demonstrated here with examples that show how to program complex and essential modeling tasks with few lines of code. MSL is an ongoing and evolving project, with new features and improvements being introduced regularly, but it is mature and suitable for production and has been used in numerous protein modeling and design projects. MSL is open-source software, freely downloadable at http://msl-libraries.org. We propose it as a common platform for the development of new molecular algorithms and to promote the distribution, sharing, and reutilization of computational methods.
Collapse
Affiliation(s)
| | | | | | - Brett T. Hannigan
- U. of Pennsylvania, Genomics and Computational Biology Graduate Group
| | | | | | | |
Collapse
|
75
|
Deng H, Jia Y, Wei Y, Zhang Y. What is the best reference state for designing statistical atomic potentials in protein structure prediction? Proteins 2012; 80:2311-22. [PMID: 22623012 DOI: 10.1002/prot.24121] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2012] [Revised: 04/30/2012] [Accepted: 05/21/2012] [Indexed: 01/01/2023]
Abstract
Many statistical potentials were developed in last two decades for protein folding and protein structure recognition. The major difference of these potentials is on the selection of reference states to offset sampling bias. However, since these potentials used different databases and parameter cutoffs, it is difficult to judge what the best reference states are by examining the original programs. In this study, we aim to address this issue and evaluate the reference states by a unified database and programming environment. We constructed distance-specific atomic potentials using six widely-used reference states based on 1022 high-resolution protein structures, which are applied to rank modeling in six sets of structure decoys. The reference state on random-walk chain outperforms others in three decoy sets while those using ideal-gas, quasi-chemical approximation and averaging sample stand out in one set separately. Nevertheless, the performance of the potentials relies on the origin of decoy generations and no reference state can clearly outperform others in all decoy sets. Further analysis reveals that the statistical potentials have a contradiction between the universality and pertinence, and optimal reference states should be extracted based on specific application environments and decoy spaces.
Collapse
Affiliation(s)
- Haiyou Deng
- Department of Physics and Institute of Biophysics, Central China Normal University, Wuhan 430079, China
| | | | | | | |
Collapse
|
76
|
Abstract
The prediction of loop structures is considered one of the main challenges in the protein folding problem. Regardless of the dependence of the overall algorithm on the protein data bank, the flexibility of loop regions dictates the need for special attention to their structures. In this article, we present algorithms for loop structure prediction with fixed stem and flexible stem geometry. In the flexible stem geometry problem, only the secondary structure of three stem residues on either side of the loop is known. In the fixed stem geometry problem, the structure of the three stem residues on either side of the loop is also known. Initial loop structures are generated using a probability database for the flexible stem geometry problem, and using torsion angle dynamics for the fixed stem geometry problem. Three rotamer optimization algorithms are introduced to alleviate steric clashes between the generated backbone structures and the side chain rotamers. The structures are optimized by energy minimization using an all-atom force field. The optimized structures are clustered using a traveling salesman problem-based clustering algorithm. The structures in the densest clusters are then utilized to refine dihedral angle bounds on all amino acids in the loop. The entire procedure is carried out for a number of iterations, leading to improved structure prediction and refined dihedral angle bounds. The algorithms presented in this article have been tested on 3190 loops from the PDBSelect25 data set and on targets from the recently concluded CASP9 community-wide experiment.
Collapse
Affiliation(s)
- A. Subramani
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
77
|
Nahar N, Rahman A, Moś M, Warzecha T, Algerin M, Ghosh S, Johnson-Brousseau S, Mandal A. In silico and in vivo studies of an Arabidopsis thaliana gene, ACR2, putatively involved in arsenic accumulation in plants. J Mol Model 2012; 18:4249-62. [PMID: 22562211 DOI: 10.1007/s00894-012-1419-y] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2012] [Accepted: 03/26/2012] [Indexed: 12/27/2022]
Abstract
Previously, our in silico analyses identified four candidate genes that might be involved in uptake and/or accumulation of arsenics in plants: arsenate reductase 2 (ACR2), phytochelatin synthase 1 (PCS1) and two multi-drug resistant proteins (MRP1 and MRP2) [Lund et al. (2010) J Biol Syst 18:223-224]. We also postulated that one of these four genes, ACR2, seems to play a central role in this process. To investigate further, we have constructed a 3D structure of the Arabidopsis thaliana ACR2 protein using the iterative implementation of the threading assembly refinement (I-TASSER) server. These analyses revealed that, for catalytic metabolism of arsenate, the arsenate binding-loop (AB-loop) and residues Phe-53, Phe-54, Cys-134, Cys-136, Cys-141, Cys-145, and Lys-135 are essential for reducing arsenate to arsenic intermediates (arsenylated enzyme-substrate intermediates) and arsenite in plants. Thus, functional predictions suggest that the ACR2 protein is involved in the conversion of arsenate to arsenite in plant cells. To validate the in silico results, we exposed a transfer-DNA (T-DNA)-tagged mutant of A. thaliana (mutation in the ACR2 gene) to various amounts of arsenic. Reverse transcriptase PCR revealed that the mutant exhibits significantly reduced expression of the ACR2 gene. Spectrophotometric analyses revealed that the amount of accumulated arsenic compounds in this mutant was approximately six times higher than that observed in control plants. The results obtained from in silico analyses are in complete agreement with those obtained in laboratory experiments.
Collapse
Affiliation(s)
- Noor Nahar
- School of Life Sciences, University of Skövde, PO Box 408, 541 28, Skövde, Sweden
| | | | | | | | | | | | | | | |
Collapse
|
78
|
Subramani A, Wei Y, Floudas CA. ASTRO-FOLD 2.0: an Enhanced Framework for Protein Structure Prediction. AIChE J 2012; 58:1619-1637. [PMID: 23049093 DOI: 10.1002/aic.12669] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
The three-dimensional (3-D) structure prediction of proteins, given their amino acid sequence, is addressed using the first principles-based approach ASTRO-FOLD 2.0. The key features presented are: (1) Secondary structure prediction using a novel optimization-based consensus approach, (2) β-sheet topology prediction using mixed-integer linear optimization (MILP), (3) Residue-to-residue contact prediction using a high-resolution distance-dependent force field and MILP formulation, (4) Tight dihedral angle and distance bound generation for loop residues using dihedral angle clustering and non-linear optimization (NLP), (5) 3-D structure prediction using deterministic global optimization, stochastic conformational space annealing, and the full-atomistic ECEPP/3 potential, (6) Near-native structure selection using a traveling salesman problem-based clustering approach, ICON, and (7) Improved bound generation using chemical shifts of subsets of heavy atoms, generated by SPARTA and CS23D. Computational results of ASTRO-FOLD 2.0 on 47 blind targets of the recently concluded CASP9 experiment are presented.
Collapse
Affiliation(s)
- A Subramani
- Dept. of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544
| | | | | |
Collapse
|
79
|
Armano G, Ledda F. Exploiting intrastructure information for secondary structure prediction with multifaceted pipelines. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:799-808. [PMID: 22201070 DOI: 10.1109/tcbb.2011.159] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Predicting the secondary structure of proteins is still a typical step in several bioinformatic tasks, in particular, for tertiary structure prediction. Notwithstanding the impressive results obtained so far, mostly due to the advent of sequence encoding schemes based on multiple alignment, in our view the problem should be studied from a novel perspective, in which understanding how available information sources are dealt with plays a central role. After revisiting a well-known secondary structure predictor viewed from this perspective (with the goal of identifying which sources of information have been considered and which have not), we propose a generic software architecture designed to account for all relevant information sources. To demonstrate the validity of the approach, a predictor compliant with the proposed generic architecture has been implemented and compared with several state-of-the-art secondary structure predictors. Experiments have been carried out on standard data sets, and the corresponding results confirm the validity of the approach. The predictor is available at http://iasc.diee.unica.it/ssp2/ through the corresponding web application or as downloadable stand-alone portable unpack-and-run bundle.
Collapse
Affiliation(s)
- Giuliano Armano
- Department of Electrical and Electronic Engineering, University of Cagliari, Piazza d’Armi, Cagliari 09123, Italy.
| | | |
Collapse
|
80
|
Bhageerath—Targeting the near impossible: Pushing the frontiers of atomic models for protein tertiary structure prediction#. J CHEM SCI 2012. [DOI: 10.1007/s12039-011-0189-x] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
81
|
Subramani A, Floudas CA. β-sheet topology prediction with high precision and recall for β and mixed α/β proteins. PLoS One 2012; 7:e32461. [PMID: 22427840 PMCID: PMC3302896 DOI: 10.1371/journal.pone.0032461] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/16/2011] [Accepted: 01/26/2012] [Indexed: 11/19/2022] Open
Abstract
The prediction of the correct -sheet topology for pure and mixed proteins is a critical intermediate step toward the three dimensional protein structure prediction. The predicted beta sheet topology provides distance constraints between sequentially separated residues, which reduces the three dimensional search space for a protein structure prediction algorithm. Here, we present a novel mixed integer linear optimization based framework for the prediction of -sheet topology in and mixed proteins. The objective is to maximize the total strand-to-strand contact potential of the protein. A large number of physical constraints are applied to provide biologically meaningful topology results. The formulation permits the creation of a rank-ordered list of preferred -sheet arrangements. Finally, the generated topologies are re-ranked using a fully atomistic approach involving torsion angle dynamics and clustering. For a large, non-redundant data set of 2102 and mixed proteins with at least 3 strands taken from the PDB, the proposed approach provides the top 5 solutions with average precision and recall greater than 78%. Consistent results are obtained in the -sheet topology prediction for blind targets provided during the CASP8 and CASP9 experiments, as well as for actual and predicted secondary structures. The -sheet topology prediction algorithm, BeST, is available to the scientific community at http://selene.princeton.edu/BeST/.
Collapse
Affiliation(s)
| | - Christodoulos A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, New Jersey, United States of America
- * E-mail:
| |
Collapse
|
82
|
Abstract
Proteins bind to other proteins efficiently and specifically to carry on many cell functions such as signaling, activation, transport, enzymatic reactions, and more. To determine the geometry and strength of binding of a protein pair, an energy function is required. An algorithm to design an optimal energy function, based on empirical data of protein complexes, is proposed and applied. Emphasis is made on negative design in which incorrect geometries are presented to the algorithm that learns to avoid them. For the docking problem the search for plausible geometries can be performed exhaustively. The possible geometries of the complex are generated on a grid with the help of a fast Fourier transform algorithm. A novel formulation of negative design makes it possible to investigate iteratively hundreds of millions of negative examples while monotonically improving the quality of the potential. Experimental structures for 640 protein complexes are used to generate positive and negative examples for learning parameters. The algorithm designed in this work finds the correct binding structure as the lowest energy minimum in 318 cases of the 640 examples. Further benchmarks on independent sets confirm the significant capacity of the scoring function to recognize correct modes of interactions.
Collapse
Affiliation(s)
- D V S Ravikant
- Department of Computer Science, Cornell University, 4130 Upson Hall, Ithaca, New York 14853, USA
| | | |
Collapse
|
83
|
Do LH, Lippard SJ. Evolution of strategies to prepare synthetic mimics of carboxylate-bridged diiron protein active sites. J Inorg Biochem 2011; 105:1774-85. [PMID: 22113107 PMCID: PMC3232320 DOI: 10.1016/j.jinorgbio.2011.08.025] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2011] [Revised: 08/08/2011] [Accepted: 08/11/2011] [Indexed: 10/17/2022]
Abstract
We present a comprehensive review of research conducted in our laboratory in pursuit of the long-term goal of reproducing the structures and reactivity of carboxylate-bridged diiron centers used in biology to activate dioxygen for the conversion of hydrocarbons to alcohols and related products. This article describes the evolution of strategies devised to achieve these goals and illustrates the challenges in getting there. Particular emphasis is placed on controlling the geometry and coordination environment of the diiron core, preventing formation of polynuclear iron clusters, maintaining the structural integrity of model complexes during reactions with dioxygen, and tuning the ligand framework to stabilize desired oxygenated diiron species. Studies of the various model systems have improved our understanding of the electronic and physical characteristics of carboxylate-bridged diiron units and their reactivity toward molecular oxygen and organic moieties. The principles and lessons that have emerged from these investigations will guide future efforts to develop more sophisticated diiron protein model complexes.
Collapse
Affiliation(s)
- Loi H. Do
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139. U.S.A
| | - Stephen J. Lippard
- Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139. U.S.A
| |
Collapse
|
84
|
Wei Y, Thompson J, Floudas CA. CONCORD: a consensus method for protein secondary structure prediction via mixed integer linear optimization. Proc Math Phys Eng Sci 2011. [DOI: 10.1098/rspa.2011.0514] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Most of the protein structure prediction methods use a multi-step process, which often includes secondary structure prediction, contact prediction, fragment generation, clustering, etc. For many years, secondary structure prediction has been the workhorse for numerous methods aimed at predicting protein structure and function. This paper presents a new mixed integer linear optimization (MILP)-based consensus method: a Consensus scheme based On a mixed integer liNear optimization method for seCOndary stRucture preDiction (CONCORD). Based on seven secondary structure prediction methods, SSpro, DSC, PROF, PROFphd, PSIPRED, Predator and GorIV, the MILP-based consensus method combines the strengths of different methods, maximizes the number of correctly predicted amino acids and achieves a better prediction accuracy. The method is shown to perform well compared with the seven individual methods when tested on the PDBselect25 training protein set using sixfold cross validation. It also performs well compared with another set of 10 online secondary structure prediction servers (including several recent ones) when tested on the CASP9 targets (
http://predictioncenter.org/casp9/
). The average Q3 prediction accuracy is 83.04 per cent for the sixfold cross validation of the PDBselect25 set and 82.3 per cent for the CASP9 targets. We have developed a MILP-based consensus method for protein secondary structure prediction. A web server, CONCORD, is available to the scientific community at
http://helios.princeton.edu/CONCORD
.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - J. Thompson
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
85
|
Wei Y, Floudas CA. Enhanced Inter-helical Residue Contact Prediction in Transmembrane Proteins. Chem Eng Sci 2011; 66:4356-4369. [PMID: 21892227 PMCID: PMC3164537 DOI: 10.1016/j.ces.2011.04.033] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
In this paper, based on a recent work by McAllister and Floudas who developed a mathematical optimization model to predict the contacts in transmembrane alpha-helical proteins from a limited protein data set [1], we have enhanced this method by 1) building a more comprehensive data set for transmembrane alpha-helical proteins and this enhanced data set is then used to construct the probability sets, MIN-1N and MIN-2N, for residue contact prediction, 2) enhancing the mathematical model via modifications of several important physical constraints and 3) applying a new blind contact prediction scheme on different protein sets proposed from analyzing the contact prediction on 65 proteins from Fuchs et al. [2]. The blind contact prediction scheme has been tested on two different membrane protein sets. Firstly it is applied to five carefully selected proteins from the training set. The contact prediction of these five proteins uses probability sets built by excluding the target protein from the training set, and an average accuracy of 56% was obtained. Secondly, it is applied to six independent membrane proteins with complicated topologies, and the prediction accuracies are 73% for 2ZY9A, 21% for 3KCUA, 46% for 2W1PA, 64% for 3CN5A, 77% for 3IXZA and 83% for 3K3FA. The average prediction accuracy for the six proteins is 60.7%. The proposed approach is also compared with a support vector machine method (TMhit [3]) and it is shown that it exhibits better prediction accuracy.
Collapse
Affiliation(s)
- Y. Wei
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|
86
|
Xu D, Zhang J, Roy A, Zhang Y. Automated protein structure modeling in CASP9 by I-TASSER pipeline combined with QUARK-based ab initio folding and FG-MD-based structure refinement. Proteins 2011; 79 Suppl 10:147-60. [PMID: 22069036 PMCID: PMC3228277 DOI: 10.1002/prot.23111] [Citation(s) in RCA: 117] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2011] [Revised: 06/07/2011] [Accepted: 06/26/2011] [Indexed: 11/09/2022]
Abstract
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and fragment-guided molecular dynamics (FG-MD), were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles, and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of β-proteins are still needed to further improve the I-TASSER pipeline.
Collapse
Affiliation(s)
- Dong Xu
- Center for Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | | | | | | |
Collapse
|
87
|
Gardner DP, Ren P, Ozer S, Gutell RR. Statistical potentials for hairpin and internal loops improve the accuracy of the predicted RNA structure. J Mol Biol 2011; 413:473-83. [PMID: 21889515 DOI: 10.1016/j.jmb.2011.08.033] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Revised: 08/12/2011] [Accepted: 08/16/2011] [Indexed: 01/19/2023]
Abstract
RNA is directly associated with a growing number of functions within the cell. The accurate prediction of different RNA higher-order structures from their nucleic acid sequences will provide insight into their functions and molecular mechanics. We have been determining statistical potentials for a collection of structural elements that is larger than the number of structural elements determined with experimentally determined energy values. The experimentally derived free energies and the statistical potentials for canonical base-pair stacks are analogous, demonstrating that statistical potentials derived from comparative data can be used as an alternative energetic parameter. A new computational infrastructure-RNA Comparative Analysis Database (rCAD)-that utilizes a relational database was developed to manipulate and analyze very large sequence alignments and secondary-structure data sets. Using rCAD, we determined a richer set of energetic parameters for RNA fundamental structural elements including hairpin and internal loops. A new version of RNAfold was developed to utilize these statistical potentials. Overall, these new statistical potentials for hairpin and internal loops integrated into the new version of RNAfold demonstrated significant improvements in the prediction accuracy of RNA secondary structure.
Collapse
Affiliation(s)
- David P Gardner
- Center for Computational Biology and Bioinformatics, Section of Integrative Biology in the School of Biological Sciences, University of Texas at Austin, Austin, TX 78712, USA
| | | | | | | |
Collapse
|
88
|
Assary RS, Broadbelt LJ. Computational screening of novel thiamine-catalyzed decarboxylation reactions of 2-keto acids. Bioprocess Biosyst Eng 2011; 34:375-88. [PMID: 21061135 DOI: 10.1007/s00449-010-0481-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2010] [Accepted: 10/18/2010] [Indexed: 01/02/2023]
Abstract
A molecular modeling strategy to screen the capacity of known enzymes to catalyze the reactions of non-native substrates is presented. The binding of pyruvic acid and non-native ketoacids in the active site of pyruvate ferredoxin oxidoreductase was examined using docking analysis, and our results suggest that enzyme-non-native ketoacid-bound species are feasible. Quantum mechanics/molecular mechanics methods were then used to study the geometry of the covalent intermediate formed from the enzyme and the various ketoacids. Finally, quantum mechanical methods were used to study the decarboxylation reaction of 2-keto acids at the mechanistic level. This hierarchical screening ranked the substrates from those that cannot be accommodated by the enzyme (phenyl pyruvate) to those whose conversion rate would most closely approach that of the native substrate (2-ketobutanoic acid and 2-ketovaleric acid). Most notably, our investigation suggests that novel pathways generated using generalized enzyme actions may be screened using the hierarchical approach employed here.
Collapse
Affiliation(s)
- Rajeev S Assary
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, IL 60208, USA
| | | |
Collapse
|
89
|
Ottino JM. Chemical engineering in a complex world: Grand challenges, vast opportunities. AIChE J 2011. [DOI: 10.1002/aic.12686] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
90
|
Polydorides S, Amara N, Aubard C, Plateau P, Simonson T, Archontis G. Computational protein design with a generalized Born solvent model: application to Asparaginyl-tRNA synthetase. Proteins 2011; 79:3448-68. [PMID: 21563215 DOI: 10.1002/prot.23042] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2010] [Revised: 02/25/2011] [Accepted: 03/03/2011] [Indexed: 12/13/2022]
Abstract
Computational Protein Design (CPD) is a promising method for high throughput protein and ligand mutagenesis. Recently, we developed a CPD method that used a polar-hydrogen energy function for protein interactions and a Coulomb/Accessible Surface Area (CASA) model for solvent effects. We applied this method to engineer aspartyl-adenylate (AspAMP) specificity into Asparaginyl-tRNA synthetase (AsnRS), whose substrate is asparaginyl-adenylate (AsnAMP). Here, we implement a more accurate function, with an all-atom energy for protein interactions and a residue-pairwise generalized Born model for solvent effects. As a first test, we compute aminoacid affinities for several point mutants of Aspartyl-tRNA synthetase (AspRS) and Tyrosyl-tRNA synthetase and stability changes for three helical peptides and compare with experiment. As a second test, we readdress the problem of AsnRS aminoacid engineering. We compare three design criteria, which optimize the folding free-energy, the absolute AspAMP affinity, and the relative (AspAMP-AsnAMP) affinity. The sequences and conformations are improved with respect to our previous, polar-hydrogen/CASA study: For several designed complexes, the AspAMP carboxylate forms three interactions with a conserved arginine and a designed lysine, as in the active site of the AspRS:AspAMP complex. The conformations and interactions are well maintained in molecular dynamics simulations and the sequences have an inverted specificity, favoring AspAMP over AsnAMP. The method is not fully successful, since experimental measurements with the seven most promising sequences show that they do not catalyze at a detectable level the adenylation of Asp (or Asn) with ATP. This may be due to weak AspAMP binding and/or disruption of transition-state stabilization.
Collapse
|
91
|
Saven JG. Computational protein design: engineering molecular diversity, nonnatural enzymes, nonbiological cofactor complexes, and membrane proteins. Curr Opin Chem Biol 2011; 15:452-7. [PMID: 21493122 DOI: 10.1016/j.cbpa.2011.03.014] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2011] [Revised: 03/18/2011] [Accepted: 03/18/2011] [Indexed: 11/18/2022]
Abstract
Computational and theoretical methods are advancing protein design as a means to create and investigate proteins. Such efforts further our capacity to control, design and understand biomolecular structure, sequence and function. Herein, the focus is on some recent applications that involve using theoretical and computational methods to guide the design of protein sequence ensembles, new enzymes, proteins with novel cofactors, and membrane proteins.
Collapse
Affiliation(s)
- Jeffery G Saven
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104, USA
| |
Collapse
|
92
|
Bellows ML, Taylor MS, Cole PA, Shen L, Siliciano RF, Fung HK, Floudas CA. Discovery of entry inhibitors for HIV-1 via a new de novo protein design framework. Biophys J 2011; 99:3445-53. [PMID: 21081094 DOI: 10.1016/j.bpj.2010.09.050] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/17/2010] [Revised: 09/23/2010] [Accepted: 09/27/2010] [Indexed: 12/11/2022] Open
Abstract
A new (to our knowledge) de novo design framework with a ranking metric based on approximate binding affinity calculations is introduced and applied to the discovery of what we believe are novel HIV-1 entry inhibitors. The framework consists of two stages: a sequence selection stage and a validation stage. The sequence selection stage produces a rank-ordered list of amino-acid sequences by solving an integer programming sequence selection model. The validation stage consists of fold specificity and approximate binding affinity calculations. The designed peptidic inhibitors are 12-amino-acids-long and target the hydrophobic core of gp41. A number of the best-predicted sequences were synthesized and their inhibition of HIV-1 was tested in cell culture. All peptides examined showed inhibitory activity when compared with no drug present, and the novel peptide sequences outperformed the native template sequence used for the design. The best sequence showed micromolar inhibition, which is a 3-15-fold improvement over the native sequence, depending on the donor. In addition, the best sequence equally inhibited wild-type and Enfuvirtide-resistant virus strains.
Collapse
Affiliation(s)
- M L Bellows
- Department of Chemical and Biological Engineering, Princeton University, Princeton, NJ, USA
| | | | | | | | | | | | | |
Collapse
|
93
|
Pan SJ, Cheung WL, Fung HK, Floudas CA, Link AJ. Computational design of the lasso peptide antibiotic microcin J25. Protein Eng Des Sel 2011; 24:275-82. [PMID: 21106549 PMCID: PMC3038460 DOI: 10.1093/protein/gzq108] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2010] [Revised: 10/04/2010] [Accepted: 10/26/2010] [Indexed: 11/12/2022] Open
Abstract
Microcin J25 (MccJ25) is a 21 amino acid (aa) ribosomally synthesized antimicrobial peptide with an unusual structure in which the eight N-terminal residues form a covalently cyclized macrolactam ring through which the remaining 13 aa tail is fed. An open question is the extent of sequence space that can occupy such an extraordinary, highly constrained peptide fold. To begin answering this question, here we have undertaken a computational redesign of the MccJ25 peptide using a two-stage sequence selection procedure based on both energy minimization and fold specificity. Eight of the most highly ranked sequences from the design algorithm, each of which contained two or three amino acid substitutions, were expressed in Escherichia coli and tested for production and antimicrobial activity. Six of the eight variants were successfully produced by E.coli at production levels comparable with that of the wild-type peptide. Of these six variants, three retain detectable antimicrobial activity, although this activity is reduced relative to wild-type MccJ25. The results here build upon previous findings that even rigid, constrained structures like the lasso architecture are amenable to redesign. Furthermore, this work provides evidence that a large amount of amino acid variation is tolerated by the lasso peptide fold.
Collapse
Affiliation(s)
- Si Jia Pan
- Departments of Chemical and Biological Engineering and
| | | | - Ho Ki Fung
- Departments of Chemical and Biological Engineering and
| | | | - A. James Link
- Departments of Chemical and Biological Engineering and
- Molecular Biology, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
94
|
Dong Q, Zhou S. Novel nonlinear knowledge-based mean force potentials based on machine learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:476-486. [PMID: 20820079 DOI: 10.1109/tcbb.2010.86] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
The prediction of 3D structures of proteins from amino acid sequences is one of the most challenging problems in molecular biology. An essential task for solving this problem with coarse-grained models is to deduce effective interaction potentials. The development and evaluation of new energy functions is critical to accurately modeling the properties of biological macromolecules. Knowledge-based mean force potentials are derived from statistical analysis of proteins of known structures. Current knowledge-based potentials are almost in the form of weighted linear sum of interaction pairs. In this study, a class of novel nonlinear knowledge-based mean force potentials is presented. The potential parameters are obtained by nonlinear classifiers, instead of relative frequencies of interaction pairs against a reference state or linear classifiers. The support vector machine is used to derive the potential parameters on data sets that contain both native structures and decoy structures. Five knowledge-based mean force Boltzmann-based or linear potentials are introduced and their corresponding nonlinear potentials are implemented. They are the DIH potential (single-body residue-level Boltzmann-based potential), the DFIRE-SCM potential (two-body residue-level Boltzmann-based potential), the FS potential (two-body atom-level Boltzmann-based potential), the HR potential (two-body residue-level linear potential), and the T32S3 potential (two-body atom-level linear potential). Experiments are performed on well-established decoy sets, including the LKF data set, the CASP7 data set, and the Decoys “R”Us data set. The evaluation metrics include the energy Z score and the ability of each potential to discriminate native structures from a set of decoy structures. Experimental results show that all nonlinear potentials significantly outperform the corresponding Boltzmann-based or linear potentials, and the proposed discriminative framework is effective in developing knowledge-based mean force potentials. The nonlinear potentials can be widely used for ab initio protein structure prediction, model quality assessment, protein docking, and other challenging problems in computational biology.
Collapse
Affiliation(s)
- Qiwen Dong
- Shanghai Key Lab of Intelligent Information Processing and the School of Computer Science, Fudan University, Old Yifu Building, Room 202-5, 220 Handan Road, Shanhai 200433, China.
| | | |
Collapse
|
95
|
Ng KM, Solayappan M, Poh KL. Global energy minimization of alanine dipeptide via barrier function methods. Comput Biol Chem 2011; 35:19-23. [PMID: 21317044 DOI: 10.1016/j.compbiolchem.2010.12.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2010] [Revised: 12/26/2010] [Accepted: 12/29/2010] [Indexed: 01/01/2023]
Abstract
This paper presents an interior point method to determine the minimum energy conformation of alanine dipeptide. The CHARMM energy function is minimized over the internal coordinates of the atoms involved. A barrier function algorithm to determine the minimum energy conformation of peptides is proposed. Lennard-Jones 6-12 potential which is used to model the van der Waals interactions in the CHARMM energy equation is used as the barrier function for this algorithm. The results of applying the algorithm for the alanine dipeptide structure as a function of varying number of dihedral angles are reported, and they are compared with that obtained from genetic algorithm approach. In addition, the results for polyalanine structures are also reported.
Collapse
Affiliation(s)
- Kien Ming Ng
- Department of Industrial and Systems Engineering, National University of Singapore, Singapore.
| | | | | |
Collapse
|
96
|
Domingo-Espín J, Unzueta U, Saccardo P, Rodríguez-Carmona E, Corchero JL, Vázquez E, Ferrer-Miralles N. Engineered biological entities for drug delivery and gene therapy protein nanoparticles. PROGRESS IN MOLECULAR BIOLOGY AND TRANSLATIONAL SCIENCE 2011; 104:247-98. [PMID: 22093221 PMCID: PMC7173510 DOI: 10.1016/b978-0-12-416020-0.00006-1] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
The development of genetic engineering techniques has speeded up the growth of the biotechnological industry, resulting in a significant increase in the number of recombinant protein products on the market. The deep knowledge of protein function, structure, biological interactions, and the possibility to design new polypeptides with desired biological activities have been the main factors involved in the increase of intensive research and preclinical and clinical approaches. Consequently, new biological entities with added value for innovative medicines such as increased stability, improved targeting, and reduced toxicity, among others have been obtained. Proteins are complex nanoparticles with sizes ranging from a few nanometers to a few hundred nanometers when complex supramolecular interactions occur, as for example, in viral capsids. However, even though protein production is a delicate process that imposes the use of sophisticated analytical methods and negative secondary effects have been detected in some cases as immune and inflammatory reactions, the great potential of biodegradable and tunable protein nanoparticles indicates that protein-based biotechnological products are expected to increase in the years to come.
Collapse
Affiliation(s)
- Joan Domingo-Espín
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| | - Ugutz Unzueta
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| | - Paolo Saccardo
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| | - Escarlata Rodríguez-Carmona
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| | - José Luís Corchero
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| | - Esther Vázquez
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| | - Neus Ferrer-Miralles
- Institute for Biotechnology and Biomedicine, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,Department of Genetics and Microbiology, Universitat Autònoma de Barcelona, Bellaterra, Barcelona, Spain,CIBER de Bioingeniería, Biomateriales y Nanomedicina (CIBER-BBN), Bellaterra, Barcelona, Spain
| |
Collapse
|
97
|
Day R, Qu X, Swanson R, Bohannan Z, Bliss R, Tsai J. Relative Packing Groups in Template-Based Structure Prediction: Cooperative Effects of True Positive Constraints. J Comput Biol 2011; 18:17-26. [DOI: 10.1089/cmb.2010.0078] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Affiliation(s)
- Ryan Day
- Chemistry Department, University of the Pacific, Stockton, California
| | | | - Rosemarie Swanson
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
| | - Zach Bohannan
- Department of Molecular and Cell Biology, University of California Berkeley, Berkeley, California
| | - Robert Bliss
- Department of Biochemistry and Biophysics, Texas A&M University, College Station, Texas
| | - Jerry Tsai
- Chemistry Department, University of the Pacific, Stockton, California
| |
Collapse
|
98
|
New compstatin variants through two de novo protein design frameworks. Biophys J 2010; 98:2337-46. [PMID: 20483343 DOI: 10.1016/j.bpj.2010.01.057] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2009] [Revised: 01/21/2010] [Accepted: 01/25/2010] [Indexed: 11/22/2022] Open
Abstract
Two de novo protein design frameworks are applied to the discovery of new compstatin variants. One is based on sequence selection and fold specificity, whereas the other approach is based on sequence selection and approximate binding affinity calculations. The proposed frameworks were applied to a complex of C3c with compstatin variant E1 and new variants with improved binding affinities are predicted and experimentally validated. The computational studies elucidated key positions in the sequence of compstatin that greatly affect the binding affinity. Positions 4 and 13 were found to favor Trp, whereas positions 1, 9, and 10 are dominated by Asn, and position 11 consists mainly of Gln. A structural analysis of the C3c-bound peptide analogs is presented.
Collapse
|
99
|
Dickson RJ, Wahl LM, Fernandes AD, Gloor GB. Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS One 2010; 5:e11082. [PMID: 20596526 PMCID: PMC2893159 DOI: 10.1371/journal.pone.0011082] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2010] [Accepted: 05/17/2010] [Indexed: 11/23/2022] Open
Abstract
Background There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. Methodology/Principal Findings We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. Conclusions/Significance Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.
Collapse
Affiliation(s)
- Russell J. Dickson
- Department of Biochemistry, The University of Western Ontario, London, Canada
| | - Lindi M. Wahl
- Department of Applied Mathematics, The University of Western Ontario, London, Canada
| | - Andrew D. Fernandes
- Department of Biochemistry, The University of Western Ontario, London, Canada
- Department of Applied Mathematics, The University of Western Ontario, London, Canada
| | - Gregory B. Gloor
- Department of Biochemistry, The University of Western Ontario, London, Canada
- * E-mail:
| |
Collapse
|
100
|
Rajgaria R, Wei Y, Floudas CA. Contact prediction for beta and alpha-beta proteins using integer linear optimization and its impact on the first principles 3D structure prediction method ASTRO-FOLD. Proteins 2010; 78:1825-46. [PMID: 20225257 PMCID: PMC2858251 DOI: 10.1002/prot.22696] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
An integer linear optimization model is presented to predict residue contacts in beta, alpha + beta, and alpha/beta proteins. The total energy of a protein is expressed as sum of a C(alpha)-C(alpha) distance dependent contact energy contribution and a hydrophobic contribution. The model selects contact that assign lowest energy to the protein structure as satisfying a set of constraints that are included to enforce certain physically observed topological information. A new method based on hydrophobicity is proposed to find the beta-sheet alignments. These beta-sheet alignments are used as constraints for contacts between residues of beta-sheets. This model was tested on three independent protein test sets and CASP8 test proteins consisting of beta, alpha + beta, alpha/beta proteins and it was found to perform very well. The average accuracy of the predictions (separated by at least six residues) was approximately 61%. The average true positive and false positive distances were also calculated for each of the test sets and they are 7.58 A and 15.88 A, respectively. Residue contact prediction can be directly used to facilitate the protein tertiary structure prediction. This proposed residue contact prediction model is incorporated into the first principles protein tertiary structure prediction approach, ASTRO-FOLD. The effectiveness of the contact prediction model was further demonstrated by the improvement in the quality of the protein structure ensemble generated using the predicted residue contacts for a test set of 10 proteins.
Collapse
Affiliation(s)
- R. Rajgaria
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - Y. Wei
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| | - C. A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, NJ 08544-5263, U.S.A
| |
Collapse
|