1
|
Liu X, Luo Y, Li P, Song S, Peng J. Deep geometric representations for modeling effects of mutations on protein-protein binding affinity. PLoS Comput Biol 2021; 17:e1009284. [PMID: 34347784 PMCID: PMC8366979 DOI: 10.1371/journal.pcbi.1009284] [Citation(s) in RCA: 33] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 08/16/2021] [Accepted: 07/17/2021] [Indexed: 11/19/2022] Open
Abstract
Modeling the impact of amino acid mutations on protein-protein interaction plays a crucial role in protein engineering and drug design. In this study, we develop GeoPPI, a novel structure-based deep-learning framework to predict the change of binding affinity upon mutations. Based on the three-dimensional structure of a protein, GeoPPI first learns a geometric representation that encodes topology features of the protein structure via a self-supervised learning scheme. These representations are then used as features for training gradient-boosting trees to predict the changes of protein-protein binding affinity upon mutations. We find that GeoPPI is able to learn meaningful features that characterize interactions between atoms in protein structures. In addition, through extensive experiments, we show that GeoPPI achieves new state-of-the-art performance in predicting the binding affinity changes upon both single- and multi-point mutations on six benchmark datasets. Moreover, we show that GeoPPI can accurately estimate the difference of binding affinities between a few recently identified SARS-CoV-2 antibodies and the receptor-binding domain (RBD) of the S protein. These results demonstrate the potential of GeoPPI as a powerful and useful computational tool in protein design and engineering. Our code and datasets are available at: https://github.com/Liuxg16/GeoPPI. Estimating the binding affinities of protein-protein interactions (PPIs) is crucial to understand protein function and design new functional proteins. Since the experimental measurement in wet-labs is labor-intensive and time-consuming, fast and accurate in silico approaches have received much attention. Although considerable efforts have been made in this direction, predicting the effects of mutations on the protein-protein binding affinity is still a challenging research problem. In this work, we introduce GeoPPI, a novel computational approach that uses deep geometric representations of protein complexes to predict the effects of mutations on the binding affinity. The geometric representations are first learned via a self-supervised learning scheme and then integrated with gradient-boosting trees to accomplish the prediction. We find that the learned representations encode meaningful patterns underlying the interactions between atoms in protein structures. Also, extensive tests on major benchmark datasets show that GeoPPI has made an important improvement over the existing methods in predicting the effects of mutations on the binding affinity.
Collapse
Affiliation(s)
- Xianggen Liu
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, China
- School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu, China
- Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China
| | - Yunan Luo
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
| | - Pengyong Li
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, China
- Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China
| | - Sen Song
- Laboratory for Brain and Intelligence and Department of Biomedical Engineering, Tsinghua University, Beijing, China
- Beijing Innovation Center for Future Chip, Tsinghua University, Beijing, China
- * E-mail: (JP); (SS)
| | - Jian Peng
- Department of Computer Science, University of Illinois at Urbana-Champaign, Urbana, Illinois, United States of America
- * E-mail: (JP); (SS)
| |
Collapse
|
2
|
Gaillard T, Simonson T. Full Protein Sequence Redesign with an MMGBSA Energy Function. J Chem Theory Comput 2017; 13:4932-4943. [DOI: 10.1021/acs.jctc.7b00202] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Thomas Gaillard
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| | - Thomas Simonson
- Laboratoire de Biochimie
(CNRS UMR7654), Department of Biology, Ecole Polytechnique, 91128 Palaiseau, France
| |
Collapse
|
3
|
Nagamune T. Biomolecular engineering for nanobio/bionanotechnology. NANO CONVERGENCE 2017; 4:9. [PMID: 28491487 PMCID: PMC5401866 DOI: 10.1186/s40580-017-0103-4] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/17/2017] [Accepted: 03/29/2017] [Indexed: 05/02/2023]
Abstract
Biomolecular engineering can be used to purposefully manipulate biomolecules, such as peptides, proteins, nucleic acids and lipids, within the framework of the relations among their structures, functions and properties, as well as their applicability to such areas as developing novel biomaterials, biosensing, bioimaging, and clinical diagnostics and therapeutics. Nanotechnology can also be used to design and tune the sizes, shapes, properties and functionality of nanomaterials. As such, there are considerable overlaps between nanotechnology and biomolecular engineering, in that both are concerned with the structure and behavior of materials on the nanometer scale or smaller. Therefore, in combination with nanotechnology, biomolecular engineering is expected to open up new fields of nanobio/bionanotechnology and to contribute to the development of novel nanobiomaterials, nanobiodevices and nanobiosystems. This review highlights recent studies using engineered biological molecules (e.g., oligonucleotides, peptides, proteins, enzymes, polysaccharides, lipids, biological cofactors and ligands) combined with functional nanomaterials in nanobio/bionanotechnology applications, including therapeutics, diagnostics, biosensing, bioanalysis and biocatalysts. Furthermore, this review focuses on five areas of recent advances in biomolecular engineering: (a) nucleic acid engineering, (b) gene engineering, (c) protein engineering, (d) chemical and enzymatic conjugation technologies, and (e) linker engineering. Precisely engineered nanobiomaterials, nanobiodevices and nanobiosystems are anticipated to emerge as next-generation platforms for bioelectronics, biosensors, biocatalysts, molecular imaging modalities, biological actuators, and biomedical applications.
Collapse
Affiliation(s)
- Teruyuki Nagamune
- Department of Chemistry and Biotechnology, Graduate School of Engineering, The University of Tokyo, Tokyo, Japan
| |
Collapse
|
4
|
Abstract
Computational protein design (CPD) has established itself as a leading field in basic and applied science with a strong coupling between the two. Proteins are computationally designed from the level of amino acids to the level of a functional protein complex. Design targets range from increased thermo- (or other) stability to specific requested reactions such as protein-protein binding, enzymatic reactions, or nanotechnology applications. The design scheme may encompass small regions of the proteins or the entire protein. In either case, the design may aim at the side-chains or at the full backbone conformation. Herein, the main framework for the process is outlined highlighting key elements in the CPD iterative cycle. These include the very definition of CPD, the diverse goals of CPD, components of the CPD protocol, methods for searching sequence and structure space, scoring functions, and augmenting the CPD with other optimization tools. Taken together, this chapter aims to introduce the framework of CPD.
Collapse
Affiliation(s)
- Ilan Samish
- Department of Plants and Environmental Sciences, Weizmann Institute of Science, Rehovot, Israel.
- Department of Biotechnology Engineering, Braude Academic College of Engineering, Karmiel, Israel.
- Amai Proteins Ltd., Ashdod, Israel.
| |
Collapse
|
5
|
Gaillard T, Panel N, Simonson T. Protein side chain conformation predictions with an MMGBSA energy function. Proteins 2016; 84:803-19. [PMID: 26948696 DOI: 10.1002/prot.25030] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Revised: 02/22/2016] [Accepted: 02/27/2016] [Indexed: 12/17/2022]
Abstract
The prediction of protein side chain conformations from backbone coordinates is an important task in structural biology, with applications in structure prediction and protein design. It is a difficult problem due to its combinatorial nature. We study the performance of an "MMGBSA" energy function, implemented in our protein design program Proteus, which combines molecular mechanics terms, a Generalized Born and Surface Area (GBSA) solvent model, with approximations that make the model pairwise additive. Proteus is not a competitor to specialized side chain prediction programs due to its cost, but it allows protein design applications, where side chain prediction is an important step and MMGBSA an effective energy model. We predict the side chain conformations for 18 proteins. The side chains are first predicted individually, with the rest of the protein in its crystallographic conformation. Next, all side chains are predicted together. The contributions of individual energy terms are evaluated and various parameterizations are compared. We find that the GB and SA terms, with an appropriate choice of the dielectric constant and surface energy coefficients, are beneficial for single side chain predictions. For the prediction of all side chains, however, errors due to the pairwise additive approximation overcome the improvement brought by these terms. We also show the crucial contribution of side chain minimization to alleviate the rigid rotamer approximation. Even without GB and SA terms, we obtain accuracies comparable to SCWRL4, a specialized side chain prediction program. In particular, we obtain a better RMSD than SCWRL4 for core residues (at a higher cost), despite our simpler rotamer library. Proteins 2016; 84:803-819. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Nicolas Panel
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, Palaiseau, 91128, France
| |
Collapse
|
6
|
Polydorides S, Michael E, Mignon D, Druart K, Archontis G, Simonson T. Proteus and the Design of Ligand Binding Sites. Methods Mol Biol 2016; 1414:77-97. [PMID: 27094287 DOI: 10.1007/978-1-4939-3569-7_6] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
This chapter describes the organization and use of Proteus, a multitool computational suite for the optimization of protein and ligand conformations and sequences, and the calculation of pK α shifts and relative binding affinities. The software offers the use of several molecular mechanics force fields and solvent models, including two generalized Born variants, and a large range of scoring functions, which can combine protein stability, ligand affinity, and ligand specificity terms, for positive and negative design. We present in detail the steps for structure preparation, system setup, construction of the interaction energy matrix, protein sequence and structure optimizations, pK α calculations, and ligand titration calculations. We discuss illustrative examples, including the chemical/structural optimization of a complex between the MHC class II protein HLA-DQ8 and the vinculin epitope, and the chemical optimization of the compstatin analog Ac-Val4Trp/His9Ala, which regulates the function of protein C3 of the complement system.
Collapse
Affiliation(s)
- Savvas Polydorides
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus
| | - Eleni Michael
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus
| | - David Mignon
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Karen Druart
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | - Georgios Archontis
- Theoretical and Computational Biophysics Group, Department of Physics, University of Cyprus, 1678, Nicosia, Cyprus.
| | - Thomas Simonson
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France.
| |
Collapse
|
7
|
Pottel J, Moitessier N. Single-Point Mutation with a Rotamer Library Toolkit: Toward Protein Engineering. J Chem Inf Model 2015; 55:2657-71. [PMID: 26623941 DOI: 10.1021/acs.jcim.5b00525] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Protein engineers have long been hard at work to harness biocatalysts as a natural source of regio-, stereo-, and chemoselectivity in order to carry out chemistry (reactions and/or substrates) not previously achieved with these enzymes. The extreme labor demands and exponential number of mutation combinations have induced computational advances in this domain. The first step in our virtual approach is to predict the correct conformations upon mutation of residues (i.e., rebuilding side chains). For this purpose, we opted for a combination of molecular mechanics and statistical data. In this work, we have developed automated computational tools to extract protein structural information and created conformational libraries for each amino acid dependent on a variable number of parameters (e.g., resolution, flexibility, secondary structure). We have also developed the necessary tool to apply the mutation and optimize the conformation accordingly. For side-chain conformation prediction, we obtained overall average root-mean-square deviations (RMSDs) of 0.91 and 1.01 Å for the 18 flexible natural amino acids within two distinct sets of over 3000 and 1500 side-chain residues, respectively. The commonly used dihedral angle differences were also evaluated and performed worse than the state of the art. These two metrics are also compared. Furthermore, we generated a family-specific library for kinases that produced an average 2% lower RMSD upon side-chain reconstruction and a residue-specific library that yielded a 17% improvement. Ultimately, since our protein engineering outlook involves using our docking software, Fitted/Impacts, we applied our mutation protocol to a benchmarked data set for self- and cross-docking. Our side-chain reconstruction does not hinder our docking software, demonstrating differences in pose prediction accuracy of approximately 2% (RMSD cutoff metric) for a set of over 200 protein/ligand structures. Similarly, when docking to a set of over 100 kinases, side-chain reconstruction (using both general and biased conformation libraries) had minimal detriment to the docking accuracy.
Collapse
Affiliation(s)
- Joshua Pottel
- Department of Chemistry, McGill University , 801 Sherbrooke Street West, Montreal, QC, Canada H3A 0B8
| | - Nicolas Moitessier
- Department of Chemistry, McGill University , 801 Sherbrooke Street West, Montreal, QC, Canada H3A 0B8
| |
Collapse
|
8
|
Li Z, Yang Y, Faraggi E, Zhan J, Zhou Y. Direct prediction of profiles of sequences compatible with a protein structure by neural networks with fragment-based local and energy-based nonlocal profiles. Proteins 2014; 82:2565-73. [PMID: 24898915 DOI: 10.1002/prot.24620] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2014] [Revised: 05/28/2014] [Accepted: 05/30/2014] [Indexed: 12/13/2022]
Abstract
Locating sequences compatible with a protein structural fold is the well-known inverse protein-folding problem. While significant progress has been made, the success rate of protein design remains low. As a result, a library of designed sequences or profile of sequences is currently employed for guiding experimental screening or directed evolution. Sequence profiles can be computationally predicted by iterative mutations of a random sequence to produce energy-optimized sequences, or by combining sequences of structurally similar fragments in a template library. The latter approach is computationally more efficient but yields less accurate profiles than the former because of lacking tertiary structural information. Here we present a method called SPIN that predicts Sequence Profiles by Integrated Neural network based on fragment-derived sequence profiles and structure-derived energy profiles. SPIN improves over the fragment-derived profile by 6.7% (from 23.6 to 30.3%) in sequence identity between predicted and wild-type sequences. The method also reduces the number of residues in low complex regions by 15.7% and has a significantly better balance of hydrophilic and hydrophobic residues at protein surface. The accuracy of sequence profiles obtained is comparable to those generated from the protein design program RosettaDesign 3.5. This highly efficient method for predicting sequence profiles from structures will be useful as a single-body scoring term for improving scoring functions used in protein design and fold recognition. It also complements protein design programs in guiding experimental design of the sequence library for screening and directed evolution of designed sequences. The SPIN server is available at http://sparks-lab.org.
Collapse
Affiliation(s)
- Zhixiu Li
- School of Informatics and Computing, Indiana University-Purdue University, Indianapolis, Indiana, 46202
| | | | | | | | | |
Collapse
|
9
|
Gaillard T, Simonson T. Pairwise decomposition of an MMGBSA energy function for computational protein design. J Comput Chem 2014; 35:1371-87. [PMID: 24854675 DOI: 10.1002/jcc.23637] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2014] [Revised: 04/14/2014] [Accepted: 05/01/2014] [Indexed: 02/02/2023]
Abstract
Computational protein design (CPD) aims at predicting new proteins or modifying existing ones. The computational challenge is huge as it requires exploring an enormous sequence and conformation space. The difficulty can be reduced by considering a fixed backbone and a discrete set of sidechain conformations. Another common strategy consists in precalculating a pairwise energy matrix, from which the energy of any sequence/conformation can be quickly obtained. In this work, we examine the pairwise decomposition of protein MMGBSA energy functions from a general theoretical perspective, and an implementation proposed earlier for CPD. It includes a Generalized Born term, whose many-body character is overcome using an effective dielectric environment, and a Surface Area term, for which we present an improved pairwise decomposition. A detailed evaluation of the error introduced by the decomposition on the different energy components is performed. We show that the error remains reasonable, compared to other uncertainties.
Collapse
Affiliation(s)
- Thomas Gaillard
- Department of Biology, Laboratoire de Biochimie (CNRS UMR7654), Ecole Polytechnique, 91128, Palaiseau, France
| | | |
Collapse
|
10
|
Alderson RG, De Ferrari L, Mavridis L, McDonagh JL, Mitchell JBO, Nath N. Enzyme informatics. Curr Top Med Chem 2014; 12:1911-23. [PMID: 23116471 DOI: 10.2174/156802612804547353] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2012] [Revised: 09/12/2012] [Accepted: 09/15/2012] [Indexed: 12/18/2022]
Abstract
Over the last 50 years, sequencing, structural biology and bioinformatics have completely revolutionised biomolecular science, with millions of sequences and tens of thousands of three dimensional structures becoming available. The bioinformatics of enzymes is well served by, mostly free, online databases. BRENDA describes the chemistry, substrate specificity, kinetics, preparation and biological sources of enzymes, while KEGG is valuable for understanding enzymes and metabolic pathways. EzCatDB, SFLD and MACiE are key repositories for data on the chemical mechanisms by which enzymes operate. At the current rate of genome sequencing and manual annotation, human curation will never finish the functional annotation of the ever-expanding list of known enzymes. Hence there is an increasing need for automated annotation, though it is not yet widespread for enzyme data. In contrast, functional ontologies such as the Gene Ontology already profit from automation. Despite our growing understanding of enzyme structure and dynamics, we are only beginning to be able to design novel enzymes. One can now begin to trace the functional evolution of enzymes using phylogenetics. The ability of enzymes to perform secondary functions, albeit relatively inefficiently, gives clues as to how enzyme function evolves. Substrate promiscuity in enzymes is one example of imperfect specificity in protein-ligand interactions. Similarly, most drugs bind to more than one protein target. This may sometimes result in helpful polypharmacology as a drug modulates plural targets, but also often leads to adverse side-effects. Many chemoinformatics approaches can be used to model the interactions between druglike molecules and proteins in silico. We can even use quantum chemical techniques like DFT and QM/MM to compute the structural and energetic course of enzyme catalysed chemical reaction mechanisms, including a full description of bond making and breaking.
Collapse
Affiliation(s)
- Rosanna G Alderson
- Biomedical Sciences Research Complex and EaStCHEM School of Chemistry, Purdie Building, University of St Andrews, North Haugh, St Andrews, Scotland, UK
| | | | | | | | | | | |
Collapse
|
11
|
Tulpan D, Smith DH, Montemanni R. Thermodynamic Post-Processing versus GC-Content Pre-Processing for DNA Codes Satisfying the Hamming Distance and Reverse-Complement Constraints. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:441-452. [PMID: 26355790 DOI: 10.1109/tcbb.2014.2299815] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Stochastic, meta-heuristic and linear construction algorithms for the design of DNA strands satisfying Hamming distance and reverse-complement constraints often use a GC-content constraint to pre-process the DNA strands. Since GC-content is a poor predictor of DNA strand hybridization strength the strands can be filtered by post-processing using thermodynamic calculations. An alternative approach is considered here, where the algorithms are modified to remove consideration of GC-content and rely on post-processing alone to obtain large sets of DNA strands with satisfactory melting temperatures. The two approaches (pre-processing GC-content and post-processing melting temperatures) are compared and are shown to be complementary when large DNA sets are desired. In particular, the second approach can give significant improvements when linear constructions are used.
Collapse
|
12
|
Simonson T, Gaillard T, Mignon D, Schmidt am Busch M, Lopes A, Amara N, Polydorides S, Sedano A, Druart K, Archontis G. Computational protein design: the Proteus software and selected applications. J Comput Chem 2013; 34:2472-84. [PMID: 24037756 DOI: 10.1002/jcc.23418] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2013] [Revised: 07/08/2013] [Accepted: 07/28/2013] [Indexed: 12/13/2022]
Abstract
We describe an automated procedure for protein design, implemented in a flexible software package, called Proteus. System setup and calculation of an energy matrix are done with the XPLOR modeling program and its sophisticated command language, supporting several force fields and solvent models. A second program provides algorithms to search sequence space. It allows a decomposition of the system into groups, which can be combined in different ways in the energy function, for both positive and negative design. The whole procedure can be controlled by editing 2-4 scripts. Two applications consider the tyrosyl-tRNA synthetase enzyme and its successful redesign to bind both O-methyl-tyrosine and D-tyrosine. For the latter, we present Monte Carlo simulations where the D-tyrosine concentration is gradually increased, displacing L-tyrosine from the binding pocket and yielding the binding free energy difference, in good agreement with experiment. Complete redesign of the Crk SH3 domain is presented. The top 10000 sequences are all assigned to the correct fold by the SUPERFAMILY library of Hidden Markov Models. Finally, we report the acid/base behavior of the SNase protein. Sidechain protonation is treated as a form of mutation; it is then straightforward to perform constant-pH Monte Carlo simulations, which yield good agreement with experiment. Overall, the software can be used for a wide range of application, producing not only native-like sequences but also thermodynamic properties with errors that appear comparable to other current software packages.
Collapse
Affiliation(s)
- Thomas Simonson
- Laboratoire de Biochimie (CNRS UMR7654), Department of Biology, Ecole Polytechnique, Palaiseau, 91128, France
| | | | | | | | | | | | | | | | | | | |
Collapse
|
13
|
Suárez-Diez M, Pujol AM, Matzapetakis M, Jaramillo A, Iranzo O. Computational protein design with electrostatic focusing: experimental characterization of a conditionally folded helical domain with a reduced amino acid alphabet. Biotechnol J 2013; 8:855-64. [PMID: 23788466 DOI: 10.1002/biot.201200380] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Revised: 04/22/2013] [Accepted: 06/03/2013] [Indexed: 11/12/2022]
Abstract
Automated methodologies to design synthetic proteins from first principles use energy computations to estimate the ability of the sequences to adopt a targeted structure. This approach is still far from systematically producing native-like sequences, due, most likely, to inaccuracies when modeling the interactions between the protein and its aqueous environment. This is particularly challenging when engineering small protein domains (with less polar pair interactions than with the solvent). We have re-designed a three-helix bundle, domain B, using a fixed backbone and a four amino acid alphabet. We have enlarged the rotamer library with conformers that increase the weight of electrostatic interactions within the design process without altering the energy function used to compute the folding free energy. Our synthetic sequences show less than 15% similarity to any Swissprot sequence. We have characterized our sequences in different solvents using circular dichroism and nuclear magnetic resonance. The targeted structure achieved is dependent on the solvent used. This method can be readily extended to larger domains. Our method will be useful for the engineering of proteins that become active only in a given solvent and for designing proteins in the context of hydrophobic solvents, an important fraction of the situations in the cell.
Collapse
Affiliation(s)
- Maria Suárez-Diez
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, The Netherlands
| | | | | | | | | |
Collapse
|
14
|
Li Z, Yang Y, Zhan J, Dai L, Zhou Y. Energy functions in de novo protein design: current challenges and future prospects. Annu Rev Biophys 2013; 42:315-35. [PMID: 23451890 DOI: 10.1146/annurev-biophys-083012-130315] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
In the past decade, a concerted effort to successfully capture specific tertiary packing interactions produced specific three-dimensional structures for many de novo designed proteins that are validated by nuclear magnetic resonance and/or X-ray crystallographic techniques. However, the success rate of computational design remains low. In this review, we provide an overview of experimentally validated, de novo designed proteins and compare four available programs, RosettaDesign, EGAD, Liang-Grishin, and RosettaDesign-SR, by assessing designed sequences computationally. Computational assessment includes the recovery of native sequences, the calculation of sizes of hydrophobic patches and total solvent-accessible surface area, and the prediction of structural properties such as intrinsic disorder, secondary structures, and three-dimensional structures. This computational assessment, together with a recent community-wide experiment in assessing scoring functions for interface design, suggests that the next-generation protein-design scoring function will come from the right balance of complementary interaction terms. Such balance may be found when more negative experimental data become available as part of a training set.
Collapse
Affiliation(s)
- Zhixiu Li
- School of Informatics, Indiana University-Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | | | |
Collapse
|
15
|
Verma R, Schwaneberg U, Roccatano D. Computer-Aided Protein Directed Evolution: a Review of Web Servers, Databases and other Computational Tools for Protein Engineering. Comput Struct Biotechnol J 2012; 2:e201209008. [PMID: 24688649 PMCID: PMC3962222 DOI: 10.5936/csbj.201209008] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2012] [Revised: 10/07/2012] [Accepted: 10/12/2012] [Indexed: 12/01/2022] Open
Abstract
The combination of computational and directed evolution methods has proven a winning strategy for protein engineering. We refer to this approach as computer-aided protein directed evolution (CAPDE) and the review summarizes the recent developments in this rapidly growing field. We will restrict ourselves to overview the availability, usability and limitations of web servers, databases and other computational tools proposed in the last five years. The goal of this review is to provide concise information about currently available computational resources to assist the design of directed evolution based protein engineering experiment.
Collapse
Affiliation(s)
- Rajni Verma
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany ; Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Ulrich Schwaneberg
- Department of Biotechnology, RWTH Aachen University, Worringer Weg 1, 52074 Aachen, Germany
| | - Danilo Roccatano
- School of Engineering and Science, Jacobs University Bremen, Campus Ring 1, 28759 Bremen, Germany
| |
Collapse
|
16
|
Steiner K, Schwab H. Recent advances in rational approaches for enzyme engineering. Comput Struct Biotechnol J 2012; 2:e201209010. [PMID: 24688651 PMCID: PMC3962183 DOI: 10.5936/csbj.201209010] [Citation(s) in RCA: 100] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2012] [Revised: 10/16/2012] [Accepted: 10/18/2012] [Indexed: 11/29/2022] Open
Abstract
Enzymes are an attractive alternative in the asymmetric syntheses of chiral building blocks. To meet the requirements of industrial biotechnology and to introduce new functionalities, the enzymes need to be optimized by protein engineering. This article specifically reviews rational approaches for enzyme engineering and de novo enzyme design involving structure-based approaches developed in recent years for improvement of the enzymes’ performance, broadened substrate range, and creation of novel functionalities to obtain products with high added value for industrial applications.
Collapse
Affiliation(s)
- Kerstin Steiner
- ACIB GmbH, (Austrian Centre of Industrial Biotechnology), c/o TU Graz, 8010 Graz, Austria
| | - Helmut Schwab
- ACIB GmbH, (Austrian Centre of Industrial Biotechnology), c/o TU Graz, 8010 Graz, Austria ; Institute of Molecular Biotechnology, TU Graz, 8010 Graz, Austria
| |
Collapse
|
17
|
Abstract
The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.
Collapse
Affiliation(s)
- Christos A Ouzounis
- Institute of Agrobiotechnology, Centre for Research & Technology Hellas-CERTH, Thessaloniki, Greece.
| |
Collapse
|
18
|
Rodrigo G, Carrera J, Landrain TE, Jaramillo A. Perspectives on the automatic design of regulatory systems for synthetic biology. FEBS Lett 2012; 586:2037-42. [PMID: 22710180 DOI: 10.1016/j.febslet.2012.02.031] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2012] [Revised: 02/17/2012] [Accepted: 02/20/2012] [Indexed: 11/26/2022]
Abstract
Automatic design is based on computational modeling and optimization methods to provide prototype designs to targeted problems in an unsupervised manner. For biological circuits, we need to produce quantitative predictions of cell behavior for a given genotype as consequence of the different molecular interactions. Automatic design techniques aim at solving the inverse problem of finding the sequences of nucleotides that better fit a targeted behavior. In the post-genomic era, our molecular knowledge and modeling capabilities have allowed to start using such methodologies with success. Herein, we describe how the emergence of this new type of tools could enable novel synthetic biology applications. We highlight the essential elements to develop automatic design procedures for synthetic biology pointing out their advantages and bottlenecks. We discuss in detail the experimental difficulties to overcome in the in vivo implementation of designed networks. The use of automatic design to engineer biological networks is starting to emerge as a new technique to perform synthetic biology, which should not be neglected in the future.
Collapse
Affiliation(s)
- Guillermo Rodrigo
- Institute of Systems and Synthetic Biology, Université d'Évry Val d'Essonne - CNRS UPS3201 - Genopole, 91034 Évry, France
| | | | | | | |
Collapse
|
19
|
Ma L, Hong Z, Sharma B, Asher S. UV resonance Raman studies of the NaClO4 dependence of poly-L-lysine conformation and hydrogen exchange kinetics. J Phys Chem B 2012; 116:1134-42. [PMID: 22117822 PMCID: PMC3266997 DOI: 10.1021/jp208918n] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Abstract
We used 204 nm excitation UV Resonance Raman (UVRR) spectroscopy to examine the effects of NaClO(4) on the conformation of poly-L-lysine (PLL). The presence of NaClO(4) induces the formation of α-helix, π-helix/bulge, and turn conformations. The dependence of the AmIII(3) frequency on the peptide Ψ Ramachandran angle allows us to experimentally determine the conformational population distributions and the energy landscape of PLL along the Ramachandran Ψ angle. We also used UVRR to measure the NaClO(4) concentration dependence of PLL amide hydrogen exchange kinetics. Exchange rates were determined by fitting the D(2)O exchanging PLL UVRR AmII' band time evolution. Hydrogen exchange is slowed at high NaClO(4) concentrations. The PLL AmII' band exchange kinetics at 0.0, 0.2, and 0.35 M NaClO(4) can be fit by single exponentials, but the AmII' band kinetics of PLL at 0.8 M NaClO(4) requires a double exponential fit. The exchange rates for the extended conformations were monitored by measuring the C(α)-H band kinetics. These kinetics are identical to those of the AmII' band until 0.8 M NaClO(4) whereupon the extended conformation exchange becomes clearly faster than that of the α-helix-like conformations. Our results indicate that ClO(4)(-) binds to the PLL backbone to protect it from OH(-) exchange catalysis. In addition, ClO(4)(-) binding also slows the conformational exchange between the extended and α-helix-like conformations, probably by increasing the activation barriers for conformational interchanges.
Collapse
Affiliation(s)
- Lu Ma
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570, Fax: (412)-624-0588
| | - Zhenmin Hong
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570, Fax: (412)-624-0588
| | - Bhavya Sharma
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570, Fax: (412)-624-0588
| | - Sanford Asher
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570, Fax: (412)-624-0588
| |
Collapse
|
20
|
Computational Design of a DNA- and Fc-Binding Fusion Protein. Adv Bioinformatics 2011; 2011:457578. [PMID: 21941539 PMCID: PMC3173724 DOI: 10.1155/2011/457578] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2011] [Revised: 06/16/2011] [Accepted: 06/22/2011] [Indexed: 12/23/2022] Open
Abstract
Computational design of novel proteins with well-defined functions is an ongoing topic in computational biology. In this work, we generated and optimized a new synthetic fusion protein using an evolutionary approach. The optimization was guided by directed evolution based on hydrophobicity scores, molecular weight, and secondary structure predictions. Several methods were used to refine the models built from the resulting sequences. We have successfully combined two unrelated naturally occurring binding sites, the immunoglobin Fc-binding site of the Z domain and the DNA-binding motif of MyoD bHLH, into a novel stable protein.
Collapse
|
21
|
|
22
|
Manzin A, Bottauscio O, Ansalone DP. Application of the thin-shell formulation to the numerical modeling of Stern layer in biomolecular electrostatics. J Comput Chem 2011; 32:3105-13. [DOI: 10.1002/jcc.21896] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/16/2011] [Revised: 05/05/2011] [Accepted: 06/28/2011] [Indexed: 11/10/2022]
|
23
|
Camsund D, Lindblad P, Jaramillo A. Genetically engineered light sensors for control of bacterial gene expression. Biotechnol J 2011; 6:826-36. [PMID: 21648094 DOI: 10.1002/biot.201100091] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2011] [Revised: 04/11/2011] [Accepted: 04/18/2011] [Indexed: 12/28/2022]
Abstract
Light of different wavelengths can serve as a transient, noninvasive means of regulating gene expression for biotechnological purposes. Implementation of advanced gene regulatory circuits will require orthogonal transcriptional systems that can be simultaneously controlled and that can produce several different control states. Fully genetically encoded light sensors take advantage of the favorable characteristics of light, do not need the supplementation of any chemical inducers or co-factors, and have been demonstrated to control gene expression in Escherichia coli. Herein, we review engineered light-sensor systems with potential for in vivo regulation of gene expression in bacteria, and highlight different means of extending the range of available light input and transcriptional output signals. Furthermore, we discuss advances in multiplexing different light sensors for achieving multichromatic control of gene expression and indicate developments that could facilitate the construction of efficient systems for light-regulated, multistate control of gene expression.
Collapse
Affiliation(s)
- Daniel Camsund
- Department of Photochemistry and Molecular Science, Uppsala University, Ångström Laboratories, Uppsala, Sweden
| | | | | |
Collapse
|
24
|
Polydorides S, Amara N, Aubard C, Plateau P, Simonson T, Archontis G. Computational protein design with a generalized Born solvent model: application to Asparaginyl-tRNA synthetase. Proteins 2011; 79:3448-68. [PMID: 21563215 DOI: 10.1002/prot.23042] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2010] [Revised: 02/25/2011] [Accepted: 03/03/2011] [Indexed: 12/13/2022]
Abstract
Computational Protein Design (CPD) is a promising method for high throughput protein and ligand mutagenesis. Recently, we developed a CPD method that used a polar-hydrogen energy function for protein interactions and a Coulomb/Accessible Surface Area (CASA) model for solvent effects. We applied this method to engineer aspartyl-adenylate (AspAMP) specificity into Asparaginyl-tRNA synthetase (AsnRS), whose substrate is asparaginyl-adenylate (AsnAMP). Here, we implement a more accurate function, with an all-atom energy for protein interactions and a residue-pairwise generalized Born model for solvent effects. As a first test, we compute aminoacid affinities for several point mutants of Aspartyl-tRNA synthetase (AspRS) and Tyrosyl-tRNA synthetase and stability changes for three helical peptides and compare with experiment. As a second test, we readdress the problem of AsnRS aminoacid engineering. We compare three design criteria, which optimize the folding free-energy, the absolute AspAMP affinity, and the relative (AspAMP-AsnAMP) affinity. The sequences and conformations are improved with respect to our previous, polar-hydrogen/CASA study: For several designed complexes, the AspAMP carboxylate forms three interactions with a conserved arginine and a designed lysine, as in the active site of the AspRS:AspAMP complex. The conformations and interactions are well maintained in molecular dynamics simulations and the sequences have an inverted specificity, favoring AspAMP over AsnAMP. The method is not fully successful, since experimental measurements with the seven most promising sequences show that they do not catalyze at a detectable level the adenylation of Asp (or Asn) with ATP. This may be due to weak AspAMP binding and/or disruption of transition-state stabilization.
Collapse
|
25
|
Ma L, Ahmed Z, Asher SA. Ultraviolet resonance Raman study of side chain electrostatic control of poly-L-lysine conformation. J Phys Chem B 2011; 115:4251-8. [PMID: 21413713 PMCID: PMC3072461 DOI: 10.1021/jp2005343] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
We used 204 nm excitation UV resonance Raman (UVRR) spectroscopy to examine the role of side chain electrostatic interactions in determining the conformation of poly-L-lysine (PLL). We examined the pH and ionic strength dependence of the UVRR. The pH dependence of PLL UVRR spectra between pH 7.1 and 11.7 cannot be described by a two-state model but requires at least one additional state. The AmIII(3) region fitting with pH 7.1 and 11.7 basis spectra reveals a small pH-induced decrease in the relative fraction of the 2.5(1)-helix conformation compared to the PPII conformation. We performed a 2D general correlation analysis on the PLL pH dependence UVRR spectra. The asynchronous spectrum shows enhanced spectral resolution. The 2D asynchronous spectrum reveals multiple components in the C(α)-H b band and the AmII band whose origins are unclear. The cross peaks in the 2D asynchronous spectrum between the AmIII band and the other bands reveals that increasing pH induces three new structures: π-helix, α-helix, and some turn structure. We find that 2.5 M NaCl does not change the equilibrium between the PPII and 2.5(1)-helix conformations by screening side chain electrostatic repulsion. The result indicates that NaCl does not penetrate the region between the side chain and the peptide backbone. We also compared PLL conformations induced by high pH to that induced by 0.8 M ClO(4)(-). Both conditions induce α-helix-like conformations. ClO(4)(-) (0.8 M) induces 6% more α-helix-like conformations than at pH 12.4. Higher pH gives rise to longer α-helices and less turn structures.
Collapse
Affiliation(s)
- Lu Ma
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570 Fax: (412)-624-0588
| | - Zeeshan Ahmed
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570 Fax: (412)-624-0588
| | - Sanford A. Asher
- Department of Chemistry, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, Tel: (412)-624-8570 Fax: (412)-624-0588
| |
Collapse
|
26
|
Glykys DJ, Szilvay GR, Tortosa P, Suárez Diez M, Jaramillo A, Banta S. Pushing the limits of automatic computational protein design: design, expression, and characterization of a large synthetic protein based on a fungal laccase scaffold. SYSTEMS AND SYNTHETIC BIOLOGY 2011; 5:45-58. [PMID: 22654993 DOI: 10.1007/s11693-011-9080-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/18/2010] [Revised: 11/12/2010] [Accepted: 02/19/2011] [Indexed: 01/29/2023]
Abstract
UNLABELLED The de novo engineering of new proteins will allow the design of complex systems in synthetic biology. But the design of large proteins is very challenging due to the large combinatorial sequence space to be explored and the lack of a suitable selection system to guide the evolution and optimization. One way to approach this challenge is to use computational design methods based on the current crystallographic data and on molecular mechanics. We have used a laccase protein fold as a scaffold to design a new protein sequence that would adopt a 3D conformation in solution similar to a wild-type protein, the Trametes versicolor (TvL) fungal laccase. Laccases are multi-copper oxidases that find utility in a variety of industrial applications. The laccases with highest activity and redox potential are generally secreted fungal glycoproteins. Prokaryotic laccases have been identified with some desirable features, but they often exhibit low redox potentials. The designed sequence (DLac) shares a 50% sequence identity to the original TvL protein. The new DLac gene was overexpressed in E. coli and the majority of the protein was found in inclusion bodies. Both soluble protein and refolded insoluble protein were purified, and their identity was verified by mass spectrometry. Neither protein exhibited the characteristic T1 copper absorbance, neither bound copper by atomic absorption, and neither was active using a variety of laccase substrates over a range of pH values. Circular dichroism spectroscopy studies suggest that the DLac protein adopts a molten globule structure that is similar to the denatured and refolded native fungal TvL protein, which is significantly different from the natively secreted fungal protein. Taken together, these results indicate that the computationally designed DLac expressed in E. coli is unable to utilize the same folding pathway that is used in the expression of the parent TvL protein or the prokaryotic laccases. This sequence can be used going forward to help elucidate the sequence requirements needed for prokaryotic multi-copper oxidase expression. ELECTRONIC SUPPLEMENTARY MATERIAL The online version of this article (doi:10.1007/s11693-011-9080-9) contains supplementary material, which is available to authorized users.
Collapse
|
27
|
Dai L, Yang Y, Kim HR, Zhou Y. Improving computational protein design by using structure-derived sequence profile. Proteins 2010; 78:2338-48. [PMID: 20544969 DOI: 10.1002/prot.22746] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
Designing a protein sequence that will fold into a predefined structure is of both practical and fundamental interest. Many successful, computational designs in the last decade resulted from improved understanding of hydrophobic and polar interactions between side chains of amino acid residues in stabilizing protein tertiary structures. However, the coupling between main-chain backbone structure and local sequence has yet to be fully addressed. Here, we attempt to account for such coupling by using a sequence profile derived from the sequences of five residue fragments in a fragment library that are structurally matched to the five-residue segments contained in a target structure. We further introduced a term to reduce low complexity regions of designed sequences. These two terms together with optimized reference states for amino-acid residues were implemented in the RosettaDesign program. The new method, called RosettaDesign-SR, makes a 12% increase (from 34 to 46%) in fraction of proteins whose designed sequences are more than 35% identical to wild-type sequences. Meanwhile, it reduces 8% (from 22% to 14%) to the number of designed sequences that are not homologous to any known protein sequences according to psi-blast. More importantly, the sequences designed by RosettaDesign-SR have 2-3% more polar residues at the surface and core regions of proteins and these surface and core polar residues have about 4% higher sequence identity to wild-type sequences than by RosettaDesign. Thus, the proteins designed by RosettaDesign-SR should be less likely to aggregate and more likely to have unique structures due to more specific polar interactions.
Collapse
Affiliation(s)
- Liang Dai
- School of Informatics, Indiana University Purdue University, Indianapolis, Indiana 46202, USA
| | | | | | | |
Collapse
|
28
|
Lassila JK. Conformational diversity and computational enzyme design. Curr Opin Chem Biol 2010; 14:676-82. [PMID: 20829099 PMCID: PMC2953567 DOI: 10.1016/j.cbpa.2010.08.010] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2010] [Revised: 08/06/2010] [Accepted: 08/06/2010] [Indexed: 11/22/2022]
Abstract
The application of computational protein design methods to the design of enzyme active sites offers potential routes to new catalysts and new reaction specificities. Computational design methods have typically treated the protein backbone as a rigid structure for the sake of computational tractability. However, this fixed-backbone approximation introduces its own special challenges for enzyme design and it contrasts with an emerging picture of natural enzymes as dynamic ensembles with multiple conformations and motions throughout a reaction cycle. This review considers the impact of conformational variation and dynamics on computational enzyme design and it highlights new approaches to addressing protein conformational diversity in enzyme design including recent advances in multi-state design, backbone flexibility, and computational library design.
Collapse
Affiliation(s)
- Jonathan K Lassila
- Department of Biochemistry, Stanford University, Stanford, CA 94305, USA.
| |
Collapse
|
29
|
Chen Z, Wilmanns M, Zeng AP. Structural synthetic biotechnology: from molecular structure to predictable design for industrial strain development. Trends Biotechnol 2010; 28:534-42. [PMID: 20727604 DOI: 10.1016/j.tibtech.2010.07.004] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2010] [Revised: 07/14/2010] [Accepted: 07/15/2010] [Indexed: 10/19/2022]
Abstract
The future of industrial biotechnology requires efficient development of highly productive and robust strains of microorganisms. Present praxis of strain development cannot adequately fulfill this requirement, primarily owing to the inability to control reactions precisely at a molecular level, or to predict reliably the behavior of cells upon perturbation. Recent developments in two areas of biology are changing the situation rapidly: structural biology has revealed details about enzymes and associated bioreactions at an atomic level; and synthetic biology has provided tools to design and assemble precisely controllable modules for re-programming cellular metabolic circuitry. However, because of different emphases, to date, these two areas have developed separately. A linkage between them is desirable to harness their concerted potential. We therefore propose structural synthetic biotechnology as a new field in biotechnology, specifically for application to the development of industrial microbial strains.
Collapse
Affiliation(s)
- Zhen Chen
- Institute of Bioprocess and Biosystems Engineering, Hamburg University of Technology, Denickestrasse 15, D-21073 Hamburg, Germany
| | | | | |
Collapse
|
30
|
Origins of catalysis by computationally designed retroaldolase enzymes. Proc Natl Acad Sci U S A 2010; 107:4937-42. [PMID: 20194782 DOI: 10.1073/pnas.0913638107] [Citation(s) in RCA: 82] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023] Open
Abstract
We have investigated recently reported computationally designed retroaldolase enzymes with the goal of understanding the extent and the origins of their catalytic power. Direct comparison of the designed enzymes to primary amine catalysts in solution revealed a rate acceleration of 10(5)-fold for the most active of the designed retroaldolases. Through pH-rate studies of the designed retroaldolases and evaluation of a Brønsted correlation for a series of amine catalysts, we found that lysine pK(a) values are shifted by 3-4 units in the enzymes but that the catalytic contributions from the shifted pK(a) values are estimated to be modest, about 10-fold. For the most active of the reported enzymes, we evaluated the catalytic contribution of two other design components: a motif intended to stabilize a bound water molecule and hydrophobic substrate binding interactions. Mutational analysis suggested that the bound water motif does not contribute to the rate acceleration. Comparison of the rate acceleration of the designed substrate relative to a minimal substrate suggested that hydrophobic substrate binding interactions contribute around 10(3)-fold to the enzymatic rate acceleration. Altogether, these results suggest that substrate binding interactions and shifting the pK(a) of the catalytic lysine can account for much of the enzyme's rate acceleration. Additional observations suggest that these interactions are limited in the specificity of placement of substrate and active site catalytic groups. Thus, future design efforts may benefit from a focus on achieving precision in binding interactions and placement of catalytic groups.
Collapse
|
31
|
Thomas A, Joris B, Brasseur R. Standardized evaluation of protein stability. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2010; 1804:1265-71. [PMID: 20176144 DOI: 10.1016/j.bbapap.2010.02.008] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/07/2009] [Revised: 01/24/2010] [Accepted: 02/10/2010] [Indexed: 11/25/2022]
Abstract
We compare mean force potential values of a large series of PDB models of proteins and peptides and find that, either as monomers or polymers, proteins longer than 200-250 residues have equivalent MFP values that are averaged to -65+/-3 kcal/aa. This value is named the standard or stability value. The standard value is reached irrespective of sequences and 3D folds. Peptides are too short to follow the rule and frequently exist as populations of conformers; one exception is peptides in amyloid fibrils. Fibrils surpass the standard value in accordance with their uppermost stability. In parallel, we calculate median MFP values of amino acids in stably folded PDB models of proteins: median values vary from -25 for Gly to -115 kcal/aa for Trp. These median values are used to score primary sequences of proteins: all sequences converge to a mean value of -63.5+/-2.5 kcal/aa, i.e., only 1.5 kcal less than the folded model standard. Sequences from unfolded proteins have lower values. This supports the conclusion that sequences carry in an important message and more specifically that diversity of amino acids in sequences is mandatory for stability. We also use the median amino acid MFP to score residue stability in 3D folds. This demonstrates that 3D folds are compromises between fragments of high and fragments of low scores and that functional residues are often but not always in the extreme score values. The approach opens to possibilities of evaluating any 3D model and of detecting functional residues and should help in conducting mutation assays.
Collapse
Affiliation(s)
- Annick Thomas
- CBMN, Gembloux AgroBiotech, ULg, 5030 Gembloux, Belgium.
| | | | | |
Collapse
|
32
|
Alterovitz G, Muso T, Ramoni MF. The challenges of informatics in synthetic biology: from biomolecular networks to artificial organisms. Brief Bioinform 2009; 11:80-95. [PMID: 19906839 DOI: 10.1093/bib/bbp054] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
The field of synthetic biology holds an inspiring vision for the future; it integrates computational analysis, biological data and the systems engineering paradigm in the design of new biological machines and systems. These biological machines are built from basic biomolecular components analogous to electrical devices, and the information flow among these components requires the augmentation of biological insight with the power of a formal approach to information management. Here we review the informatics challenges in synthetic biology along three dimensions: in silico, in vitro and in vivo. First, we describe state of the art of the in silico support of synthetic biology, from the specific data exchange formats, to the most popular software platforms and algorithms. Next, we cast in vitro synthetic biology in terms of information flow, and discuss genetic fidelity in DNA manipulation, development strategies of biological parts and the regulation of biomolecular networks. Finally, we explore how the engineering chassis can manipulate biological circuitries in vivo to give rise to future artificial organisms.
Collapse
Affiliation(s)
- Gil Alterovitz
- Children's Hospital Informatics Program, Harvard/MITDivision of Health Sciences and Technology, USA
| | | | | |
Collapse
|
33
|
Diez MS, Lam CM, Leprince A, Martins dos Santos VAP. (Re-)construction, characterization and modeling of systems for synthetic biology. Biotechnol J 2009; 4:1382-91. [DOI: 10.1002/biot.200900173] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
|
34
|
Marchisio MA, Stelling J. Computational design tools for synthetic biology. Curr Opin Biotechnol 2009; 20:479-85. [PMID: 19758796 DOI: 10.1016/j.copbio.2009.08.007] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2009] [Revised: 08/19/2009] [Accepted: 08/23/2009] [Indexed: 10/20/2022]
Abstract
Computer-aided design, pervasive in other engineering disciplines, is currently developing in synthetic biology. Concepts for standardization and hierarchies of parts, devices and systems provide a basis for efficient engineering in biology. Recently developed computational tools, for instance, enable rational (and graphical) composition of genetic circuits from standard parts, and subsequent simulation for testing the predicted functions in silico. The computational design of DNA and proteins with predetermined quantitative functions has made similar advances. The biggest challenge, however, is the integration of tools and methods into powerful and intuitively usable workflows-and the field is only starting to address it.
Collapse
Affiliation(s)
- Mario A Marchisio
- Department of Biosystems Science and Engineering, ETH Zurich, Basel 4058, Switzerland.
| | | |
Collapse
|
35
|
Rizvi SB, Shukla AK, Dubey VK. A simple method based on multiple alignment and phylogeny to derive a correlation between the protein fold and sequence via motif search. Interdiscip Sci 2009; 1:235-243. [PMID: 20640843 DOI: 10.1007/s12539-009-0041-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2009] [Revised: 04/18/2009] [Accepted: 05/11/2009] [Indexed: 05/29/2023]
Abstract
Predicting information regarding the structure of the protein from its sequence still remains an uphill task. Though both are intimately linked, it has been found difficult so far to get a direct correlation between the two. In our present approach we use a simple method based on multiple alignment and phylogeny to derive a correlation between the protein structure and sequence via motif search. The protein families which we have considered are SH2 like, Homeodomain, Leucine rich repeat, Alphabeta knot trefoilknot and ferritin like helix bundle. We have been able to successfully predict the protein families with an average prediction of accuracy of 81%, the highest being 89% and the lowest being 73% on our test data set.
Collapse
Affiliation(s)
- Syed Baquer Rizvi
- Department of Biotechnology, Indian Institute of Technology Guwahati, Assam, 781039, India
| | | | | |
Collapse
|
36
|
|
37
|
Carrera J, Rodrigo G, Jaramillo A. Towards the automated engineering of a synthetic genome. MOLECULAR BIOSYSTEMS 2009; 5:733-43. [PMID: 19562112 DOI: 10.1039/b904400k] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
The development of the technology to synthesize new genomes and to introduce them into hosts with inactivated wild-type chromosome opens the door to new horizons in synthetic biology. Here it is of outmost importance to harness the ability of using computational design to predict and optimize a synthetic genome before attempting its synthesis. The methodology to computationally design a genome is based on an optimization that computationally mimics genome evolution. The biggest bottleneck lies on the use of an appropriate fitness function. This fitness function, usually cell growth, relies on the ability to quantitatively model the biochemical networks of the cell at the genome scale using parameters inferred from high-throughput data. Computational methods integrating such models in a common multilayer design platform can be used to automatically engineer synthetic genomes under physiological specifications. We describe the current state-of-the-art on automated methods for engineering or re-engineering synthetic genomes. We restrict ourselves to global models of metabolism, transcription and DNA structure. Although we are still far from the de novo computational genome design, it is important to collect all relevant work towards this goal. Finally, we discuss future perspectives about the practicability of an automated methodology for such computational design of synthetic genomes.
Collapse
Affiliation(s)
- Javier Carrera
- Instituto de Biología Molecular y Celular de Plantas, Consejo Superior de Investigaciones Científicas-UPV, 46022 València, Spain
| | | | | |
Collapse
|