1
|
Smith N, Horswill AR, Wilson MA. X-ray-driven chemistry and conformational heterogeneity in atomic resolution crystal structures of bacterial dihydrofolate reductases. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.11.07.566054. [PMID: 37986818 PMCID: PMC10659368 DOI: 10.1101/2023.11.07.566054] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/22/2023]
Abstract
Dihydrofolate reductase (DHFR) catalyzes the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate. Bacterial DHFRs are targets of several important antibiotics as well as model enzymes for the role of protein conformational dynamics in enzyme catalysis. We collected 0.93 Å resolution X-ray diffraction data from both Bacillus subtilis (Bs) and E. coli (Ec) DHFRs bound to folate and NADP+. These oxidized ternary complexes should not be able to perform chemistry, however electron density maps suggest hydride transfer is occurring in both enzymes. Comparison of low- and high-dose EcDHFR datasets show that X-rays drive partial production of tetrahydrofolate. Hydride transfer causes the nicotinamide moiety of NADP+ to move towards the folate as well as correlated shifts in nearby residues. Higher radiation dose also changes the conformational heterogeneity of Met20 in EcDHFR, supporting a solvent gating role during catalysis. BsDHFR has a different pattern of conformational heterogeneity and an unexpected disulfide bond, illustrating important differences between bacterial DHFRs. This work demonstrates that X-rays can drive hydride transfer similar to the native DHFR reaction and that X-ray photoreduction can be used to interrogate catalytically relevant enzyme dynamics in favorable cases.
Collapse
Affiliation(s)
- Nathan Smith
- Department of Biochemistry and Redox Biology Center, University of Nebraska-Lincoln, Lincoln, NE, 68588
| | - Alexander R. Horswill
- Department of Immunology & Microbiology, University of Colorado Anschutz School of Medicine, Aurora, CO 80045
| | - Mark A. Wilson
- Department of Biochemistry and Redox Biology Center, University of Nebraska-Lincoln, Lincoln, NE, 68588
| |
Collapse
|
2
|
Spirov AV, Myasnikova EM. Problem of Domain/Building Block Preservation in the Evolution of Biological Macromolecules and Evolutionary Computation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1345-1362. [PMID: 35594219 DOI: 10.1109/tcbb.2022.3175908] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Structurally and functionally isolated domains in biological macromolecular evolution, both natural and artificial, are largely similar to "schemata", building blocks (BBs), in evolutionary computation (EC). The problem of preserving in subsequent evolutionary searches the already found domains / BBs is well known and quite relevant in biology as well as in EC. Both biology and EC are seeing parallel and independent development of several approaches to identifying and preserving previously identified domains / BBs. First, we notice the similarity of DNA shuffling methods in synthetic biology and multi-parent recombination algorithms in EC. Furthermore, approaches to computer identification of domains in proteins that are being developed in biology can be aligned with BB identification methods in EC. Finally, approaches to chimeric protein libraries optimization in biology can be compared to evolutionary search methods based on probabilistic models in EC. We propose to validate the prospects of mutual exchange of ideas and transfer of algorithms and approaches between evolutionary systems biology and EC in these three principal directions. A crucial aim of this transfer is the design of new advanced experimental techniques capable of solving more complex problems of in vitro evolution.
Collapse
|
3
|
Seo JH, Min WK, Lee SG, Yun H, Kim BG. To the Final Goal: Can We Predict and Suggest Mutations for Protein to Develop Desired Phenotype? BIOTECHNOL BIOPROC E 2018. [DOI: 10.1007/s12257-018-0064-4] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
|
4
|
Learning epistatic interactions from sequence-activity data to predict enantioselectivity. J Comput Aided Mol Des 2017; 31:1085-1096. [DOI: 10.1007/s10822-017-0090-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/10/2017] [Accepted: 12/04/2017] [Indexed: 10/18/2022]
|
5
|
Kumar RP, Kulkarni N. A receptor dependent-4D QSAR approach to predict the activity of mutated enzymes. Sci Rep 2017; 7:6273. [PMID: 28740233 PMCID: PMC5524700 DOI: 10.1038/s41598-017-06625-x] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2016] [Accepted: 06/15/2017] [Indexed: 11/29/2022] Open
Abstract
Screening and selection tools to obtain focused libraries play a key role in successfully engineering enzymes of desired qualities. The quality of screening depends on efficient assays; however, a focused library generated with a priori information plays a major role in effectively identifying the right enzyme. As a proof of concept, for the first time, receptor dependent - 4D Quantitative Structure Activity Relationship (RD-4D-QSAR) has been implemented to predict kinetic properties of an enzyme. The novelty of this study is that the mutated enzymes also form a part of the training data set. The mutations were modeled in a serine protease and molecular dynamics simulations were conducted to derive enzyme-substrate (E-S) conformations. The E-S conformations were enclosed in a high resolution grid consisting of 156,250 grid points that stores interaction energies to generate QSAR models to predict the enzyme activity. The QSAR predictions showed similar results as reported in the kinetic studies with >80% specificity and >50% sensitivity revealing that the top ranked models unambiguously differentiated enzymes with high and low activity. The interaction energy descriptors of the best QSAR model were used to identify residues responsible for enzymatic activity and substrate specificity.
Collapse
Affiliation(s)
- R Pravin Kumar
- Polyclone Bioservices, #437, 40th Cross, Jayanagar 5th Block, Bangalore, 560041, India.
| | - Naveen Kulkarni
- Polyclone Bioservices, #437, 40th Cross, Jayanagar 5th Block, Bangalore, 560041, India
| |
Collapse
|
6
|
Abstract
Faced with a protein engineering challenge, a contemporary researcher can choose from myriad design strategies. Library-scale computational protein design (LCPD) is a hybrid method suitable for the engineering of improved protein variants with diverse sequences. This chapter discusses the background and merits of several practical LCPD techniques. First, LCPD methods suitable for delocalized protein design are presented in the context of example design calculations for cellobiohydrolase II. Second, localized design methods are discussed in the context of an example design calculation intended to shift the substrate specificity of a ketol-acid reductoisomerase Rossmann domain from NADPH to NADH.
Collapse
|
7
|
Jacobs TM, Yumerefendi H, Kuhlman B, Leaver-Fay A. SwiftLib: rapid degenerate-codon-library optimization through dynamic programming. Nucleic Acids Res 2014; 43:e34. [PMID: 25539925 PMCID: PMC4357694 DOI: 10.1093/nar/gku1323] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Degenerate codon (DC) libraries efficiently address the experimental library-size limitations of directed evolution by focusing diversity toward the positions and toward the amino acids (AAs) that are most likely to generate hits; however, manually constructing DC libraries is challenging, error prone and time consuming. This paper provides a dynamic programming solution to the task of finding the best DCs while keeping the size of the library beneath some given limit, improving on the existing integer-linear programming formulation. It then extends the algorithm to consider multiple DCs at each position, a heretofore unsolved problem, while adhering to a constraint on the number of primers needed to synthesize the library. In the two library-design problems examined here, the use of multiple DCs produces libraries that very nearly cover the set of desired AAs while still staying within the experimental size limits. Surprisingly, the algorithm is able to find near-perfect libraries where the ratio of amino-acid sequences to nucleic-acid sequences approaches 1; it effectively side-steps the degeneracy of the genetic code. Our algorithm is freely available through our web server and solves most design problems in about a second.
Collapse
Affiliation(s)
- Timothy M Jacobs
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Hayretin Yumerefendi
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Brian Kuhlman
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA Lineberger Comprehensive Cancer Center, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Andrew Leaver-Fay
- Department of Biochemistry, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| |
Collapse
|
8
|
MDC-Analyzer: a novel degenerate primer design tool for the construction of intelligent mutagenesis libraries with contiguous sites. Biotechniques 2014; 56:301-2, 304, 306-8, passim. [PMID: 24924390 DOI: 10.2144/000114177] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2013] [Accepted: 04/03/2014] [Indexed: 11/23/2022] Open
Abstract
Recent computational and bioinformatics advances have enabled the efficient creation of novel biocatalysts by reducing amino acid variability at hot spot regions. To further expand the utility of this strategy, we present here a tool called Multi-site Degenerate Codon Analyzer (MDC-Analyzer) for the automated design of intelligent mutagenesis libraries that can completely cover user-defined randomized sequences, especially when multiple contiguous and/or adjacent sites are targeted. By initially defining an objective function, the possible optimal degenerate PCR primer profiles could be automatically explored using the heuristic approach of Greedy Best-First-Search. Compared to the previously developed DC-Analyzer, MDC-Analyzer allows for the existence of a small amount of undesired sequences as a tradeoff between the number of degenerate primers and the encoded library size while still providing all the benefits of DC-Analyzer with the ability to randomize multiple contiguous sites. MDC-Analyzer was validated using a series of randomly generated mutation schemes and experimental case studies on the evolution of halohydrin dehalogenase, which proved that the MDC methodology is more efficient than other methods and is particularly well-suited to exploring the sequence space of proteins using data-driven protein engineering strategies.
Collapse
|
9
|
Woo J, Robertson DL, Lovell SC. Constraints from protein structure and intra-molecular coevolution influence the fitness of HIV-1 recombinants. Virology 2014; 454-455:34-9. [PMID: 24725929 DOI: 10.1016/j.virol.2014.01.029] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2013] [Revised: 01/10/2014] [Accepted: 01/29/2014] [Indexed: 11/18/2022]
Abstract
A major challenge for developing effective treatments for HIV-1 is the viruses' ability to generate new variants. Inter-strain recombination is a major contributor to this high evolutionary rate, since at least 20% of viruses are observed to be recombinant. However, the patterns of recombination vary across the viral genome. A number of factors influence recombination, including sequence identity and secondary RNA structure. In addition the recombinant genome must code for a functional virus, and expressed proteins must fold to stable and functional structures. Any intragenic recombination that disrupts internal residue contacts may therefore produce an unfolded protein. Here we find that contact maps based on protein structures predict recombination breakpoints observed in the HIV-1 pandemic. Moreover, many pairs of contacting residues that are unlikely to be disrupted by recombination are coevolving. We conclude that purifying selection arising from protein structure and intramolecular coevolutionary changes shapes the observed patterns of recombination in HIV-1.
Collapse
Affiliation(s)
- Jeongmin Woo
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK
| | - David L Robertson
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK.
| | - Simon C Lovell
- Computational and Evolutionary Biology, Faculty of Life Sciences, University of Manchester, Oxford Road, Manchester M13 9PT, UK.
| |
Collapse
|
10
|
Zaugg J, Gumulya Y, Gillam EMJ, Bodén M. Computational tools for directed evolution: a comparison of prospective and retrospective strategies. Methods Mol Biol 2014; 1179:315-333. [PMID: 25055787 DOI: 10.1007/978-1-4939-1053-3_21] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Directed evolution methods have proved to be highly effective in the design of novel proteins and in the generation of large libraries of diverse sequences. However, searching through the vast number of mutants produced during such experiments in order to find the best represents a daunting and difficult task. In recent years, a number of computational tools have been developed to provide guidance during this exploratory process. It can, however, be unclear as to which tool or tools best complement the chosen library design strategy. In this review, we describe and critically evaluate some of the more notable tools in this area, discussing the rationale behind each, the requirements for their implementation, and potential issues faced when using them. Some examples of their application in an experimental setting are also provided. The tools have been classified based on contrasting strategies as to how they function: prospective tools SCHEMA and OPTCOMB use extant sequence and structural data to predict optimal locations for crossover sites, whereas retrospective tools ProSAR and ASRA use property data from the mutant library to predict beneficial mutations and features. From our evaluation, we suggest that each tool can play a role in the design process; however this is largely dictated by the data available and the desired experimental strategy for the project.
Collapse
Affiliation(s)
- Julian Zaugg
- School of Chemistry and Molecular Biosciences, The University of Queensland, Brisbane, Australia
| | | | | | | |
Collapse
|
11
|
Trudeau DL, Smith MA, Arnold FH. Innovation by homologous recombination. Curr Opin Chem Biol 2013; 17:902-9. [DOI: 10.1016/j.cbpa.2013.10.007] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2013] [Accepted: 10/03/2013] [Indexed: 12/11/2022]
|
12
|
Kumar A, Singh S. Directed evolution: tailoring biocatalysts for industrial applications. Crit Rev Biotechnol 2012; 33:365-78. [DOI: 10.3109/07388551.2012.716810] [Citation(s) in RCA: 73] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
|
13
|
Feng X, Sanchis J, Reetz MT, Rabitz H. Enhancing the efficiency of directed evolution in focused enzyme libraries by the adaptive substituent reordering algorithm. Chemistry 2012; 18:5646-54. [PMID: 22434591 DOI: 10.1002/chem.201103811] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2011] [Indexed: 11/11/2022]
Abstract
Directed evolution is a broadly successful strategy for protein engineering in the quest to enhance the stereoselectivity, activity, and thermostability of enzymes. To increase the efficiency of directed evolution based on iterative saturation mutagenesis, the adaptive substituent reordering algorithm (ASRA) is introduced here as an alternative to traditional quantitative structure-activity relationship (QSAR) methods for identifying potential protein mutants with desired properties from minimal sampling of focused libraries. The operation of ASRA depends on identifying the underlying regularity of the protein property landscape, allowing it to make predictions without explicit knowledge of the structure-property relationships. In a proof-of-principle study, ASRA identified all or most of the best enantioselective mutants among the synthesized epoxide hydrolase from Aspergillus niger, in the absence of peptide seeds with high E-values. ASRA even revealed a laboratory error from irregularities of the reordered E-value landscape alone.
Collapse
Affiliation(s)
- Xiaojiang Feng
- Department of Chemistry, Princeton University, New Jersey 08544, USA
| | | | | | | |
Collapse
|
14
|
He L, Friedman AM, Bailey-Kellogg C. A divide-and-conquer approach to determine the Pareto frontier for optimization of protein engineering experiments. Proteins 2012; 80:790-806. [PMID: 22180081 PMCID: PMC4939273 DOI: 10.1002/prot.23237] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2011] [Revised: 10/06/2011] [Accepted: 10/21/2011] [Indexed: 01/07/2023]
Abstract
In developing improved protein variants by site-directed mutagenesis or recombination, there are often competing objectives that must be considered in designing an experiment (selecting mutations or breakpoints): stability versus novelty, affinity versus specificity, activity versus immunogenicity, and so forth. Pareto optimal experimental designs make the best trade-offs between competing objectives. Such designs are not "dominated"; that is, no other design is better than a Pareto optimal design for one objective without being worse for another objective. Our goal is to produce all the Pareto optimal designs (the Pareto frontier), to characterize the trade-offs and suggest designs most worth considering, but to avoid explicitly considering the large number of dominated designs. To do so, we develop a divide-and-conquer algorithm, Protein Engineering Pareto FRontier (PEPFR), that hierarchically subdivides the objective space, using appropriate dynamic programming or integer programming methods to optimize designs in different regions. This divide-and-conquer approach is efficient in that the number of divisions (and thus calls to the optimizer) is directly proportional to the number of Pareto optimal designs. We demonstrate PEPFR with three protein engineering case studies: site-directed recombination for stability and diversity via dynamic programming, site-directed mutagenesis of interacting proteins for affinity and specificity via integer programming, and site-directed mutagenesis of a therapeutic protein for activity and immunogenicity via integer programming. We show that PEPFR is able to effectively produce all the Pareto optimal designs, discovering many more designs than previous methods. The characterization of the Pareto frontier provides additional insights into the local stability of design choices as well as global trends leading to trade-offs between competing criteria.
Collapse
Affiliation(s)
- Lu He
- Department of Computer Science, Dartmouth College, Hanover NH 03755
| | - Alan M. Friedman
- Department of Biological Sciences, Markey Center for Structural Biology, Purdue Cancer Center, and Bindley Bioscience Center, Purdue University
| | | |
Collapse
|
15
|
Hot spots for allosteric regulation on protein surfaces. Cell 2012; 147:1564-75. [PMID: 22196731 DOI: 10.1016/j.cell.2011.10.049] [Citation(s) in RCA: 263] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2011] [Revised: 08/10/2011] [Accepted: 10/19/2011] [Indexed: 11/22/2022]
Abstract
Recent work indicates a general architecture for proteins in which sparse networks of physically contiguous and coevolving amino acids underlie basic aspects of structure and function. These networks, termed sectors, are spatially organized such that active sites are linked to many surface sites distributed throughout the structure. Using the metabolic enzyme dihydrofolate reductase as a model system, we show that: (1) the sector is strongly correlated to a network of residues undergoing millisecond conformational fluctuations associated with enzyme catalysis, and (2) sector-connected surface sites are statistically preferred locations for the emergence of allosteric control in vivo. Thus, sectors represent an evolutionarily conserved "wiring" mechanism that can enable perturbations at specific surface positions to rapidly initiate conformational control over protein function. These findings suggest that sectors enable the evolution of intermolecular communication and regulation.
Collapse
|
16
|
Parker AS, Griswold KE, Bailey-Kellogg C. Optimization of combinatorial mutagenesis. J Comput Biol 2011; 18:1743-56. [PMID: 21923411 DOI: 10.1089/cmb.2011.0152] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein engineering by combinatorial site-directed mutagenesis evaluates a portion of the sequence space near a target protein, seeking variants with improved properties (e.g., stability, activity, immunogenicity). In order to improve the hit-rate of beneficial variants in such mutagenesis libraries, we develop methods to select optimal positions and corresponding sets of the mutations that will be used, in all combinations, in constructing a library for experimental evaluation. Our approach, OCoM (Optimization of Combinatorial Mutagenesis), encompasses both degenerate oligonucleotides and specified point mutations, and can be directed accordingly by requirements of experimental cost and library size. It evaluates the quality of the resulting library by one- and two-body sequence potentials, averaged over the variants. To ensure that it is not simply recapitulating extant sequences, it balances the quality of a library with an explicit evaluation of the novelty of its members. We show that, despite dealing with a combinatorial set of variants, in our approach the resulting library optimization problem is actually isomorphic to single-variant optimization. By the same token, this means that the two-body sequence potential results in an NP-hard optimization problem. We present an efficient dynamic programming algorithm for the one-body case and a practically-efficient integer programming approach for the general two-body case. We demonstrate the effectiveness of our approach in designing libraries for three different case study proteins targeted by previous combinatorial libraries--a green fluorescent protein, a cytochrome P450, and a beta lactamase. We found that OCoM worked quite efficiently in practice, requiring only 1 hour even for the massive design problem of selecting 18 mutations to generate 10⁷ variants of a 443-residue P450. We demonstrate the general ability of OCoM in enabling the protein engineer to explore and evaluate trade-offs between quality and novelty as well as library construction technique, and identify optimal libraries for experimental evaluation.
Collapse
Affiliation(s)
- Andrew S Parker
- Department of Computer Science, Dartmouth College, Hanover, New Hampshire, USA
| | | | | |
Collapse
|
17
|
|
18
|
Abstract
The best approach for creating libraries of functional proteins with large numbers of nondisruptive amino acid substitutions is protein recombination, in which structurally related polypeptides are swapped among homologous proteins. Unfortunately, as more distantly related proteins are recombined, the fraction of variants having a disrupted structure increases. One way to enrich the fraction of folded and potentially interesting chimeras in these libraries is to use computational algorithms to anticipate which structural elements can be swapped without disturbing the integrity of a protein's structure. Herein, we describe how the algorithm Schema uses the sequences and structures of the parent proteins recombined to predict the structural disruption of chimeras, and we outline how dynamic programming can be used to find libraries with a range of amino acid substitution levels that are enriched in variants with low Schema disruption.
Collapse
|
19
|
Farrow MF, Arnold FH. Combinatorial recombination of gene fragments to construct a library of chimeras. ACTA ACUST UNITED AC 2010; Chapter 26:Unit 26.2. [PMID: 20814931 DOI: 10.1002/0471140864.ps2602s61] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Recombination of distantly related and nonrelated genes is difficult using traditional PCR-based techniques, and truncation-based methods result in a large proportion of nonviable sequences due to frame shifts, deletions, and insertions. This unit describes a method for creating libraries of chimeras through combinatorial assembly of gene fragments. It allows the experimenter to recombine genes of any identity and to select the sites where recombination takes place. Combinatorial recombination is achieved by generating gene fragments with specific overhangs, or sticky ends. The overhangs permit the fragments to be ligated in the correct order while allowing independent assortment of blocks with identical overhangs. Genes of any identity can be recombined so long as they share 3 to 5 base pairs of identity at the desired recombination sites. Simple adaptations of the method allow incorporation of specific gene fragments.
Collapse
|
20
|
Exploiting models of molecular evolution to efficiently direct protein engineering. J Mol Evol 2010; 72:193-203. [PMID: 21132281 DOI: 10.1007/s00239-010-9415-2] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2010] [Accepted: 11/19/2010] [Indexed: 10/18/2022]
Abstract
Directed evolution and protein engineering approaches used to generate novel or enhanced biomolecular function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries containing millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomolecular properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-standard nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field of 'evolutionary synthetic biology'.
Collapse
|
21
|
Pantazes RJ, Maranas CD. OptCDR: a general computational method for the design of antibody complementarity determining regions for targeted epitope binding. Protein Eng Des Sel 2010; 23:849-58. [PMID: 20847101 DOI: 10.1093/protein/gzq061] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Antibodies are an important class of proteins with many biomedical and biotechnical applications. Although there are a plethora of experimental techniques geared toward their efficient production, there is a paucity of computational methods for their de novo design. OptCDR is a general computational method to design the binding portions of antibodies to have high specificity and affinity against any targeted epitope of an antigen. First, combinations of canonical structures for the antibody complementarity determining regions (CDRs) that are most likely to be able to favorably bind the antigen are selected. This is followed by the simultaneous refinement of the CDR structures' backbones and optimal amino acid selection for each position. OptCDR is applied to three computational test cases: a peptide from the capsid of hepatitis C, the hapten fluorescein and the protein vascular endothelial growth factor. The results demonstrate that OptCDR can efficiently generate diverse antibody libraries of a pre-specified size with promising antigen affinity potential as exemplified by computationally derived binding metrics.
Collapse
Affiliation(s)
- R J Pantazes
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | | |
Collapse
|
22
|
Reetz MT. Gerichtete Evolution stereoselektiver Enzyme: Eine ergiebige Katalysator‐Quelle für asymmetrische Reaktionen. Angew Chem Int Ed Engl 2010. [DOI: 10.1002/ange.201000826] [Citation(s) in RCA: 128] [Impact Index Per Article: 9.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Affiliation(s)
- Manfred T. Reetz
- Max‐Planck‐Institut für Kohlenforschung, Kaiser‐Wilhelm‐Platz 1, 45470 Mülheim an der Ruhr (Deutschland), Fax: (+49) 208‐306‐2985 http://www.mpi‐muelheim.mpg.de/mpikofo_home.html
| |
Collapse
|
23
|
Reetz MT. Laboratory Evolution of Stereoselective Enzymes: A Prolific Source of Catalysts for Asymmetric Reactions. Angew Chem Int Ed Engl 2010; 50:138-74. [DOI: 10.1002/anie.201000826] [Citation(s) in RCA: 441] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Affiliation(s)
- Manfred T. Reetz
- Max‐Planck‐Institut für Kohlenforschung, Kaiser‐Wilhelm‐Platz 1, 45470 Mülheim an der Ruhr (Germany), Fax: (+49) 208‐306‐2985 http://www.mpi‐muelheim.mpg.de/mpikofo_home.html
| |
Collapse
|
24
|
Zheng W, Griswold KE, Bailey-Kellogg C. Protein fragment swapping: a method for asymmetric, selective site-directed recombination. J Comput Biol 2010; 17:459-75. [PMID: 20377457 DOI: 10.1089/cmb.2009.0189] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
This article presents a new approach to site-directed recombination, swapping combinations of selected discontiguous fragments from a source protein in place of corresponding fragments of a target protein. By being both asymmetric (differentiating source and target) and selective (swapping discontiguous fragments), our method focuses experimental effort on a more restricted portion of sequence space, constructing hybrids that are more likely to have the properties that are the objective of the experiment. Furthermore, since the source and target need to be structurally homologous only locally (rather than overall), our method supports swapping fragments from functionally important regions of a source into a target "scaffold" (for example, to humanize an exogenous therapeutic protein). A protein fragment swapping plan is defined by the residue position boundaries of the fragments to be swapped; it is assessed by an average potential score over the resulting hybrid library, with singleton and pairwise terms evaluating the importance and fit of the swapped residues. While we prove that it is NP-hard to choose an optimal set of fragments under such a potential score, we develop an integer programming approach, which we call Swagmer, that works very well in practice. We demonstrate the effectiveness of our method in three swapping problems: selective recombination between beta-lactamases, activity swapping between glutathione transferases, and activity swapping between carboxylases and mutases in the purE family. We show that the selective recombination approach generates better plan (in terms of resulting potential score) than traditional site-directed recombination approaches. We also show that in all cases the optimized experiments are significantly better than ones that would result from stochastic methods.
Collapse
Affiliation(s)
- Wei Zheng
- Department of Computer Science, Dartmouth College, Hanover, New Hampshire 03755, USA
| | | | | |
Collapse
|
25
|
Villiers BRM, Stein V, Hollfelder F. USER friendly DNA recombination (USERec): a simple and flexible near homology-independent method for gene library construction. Protein Eng Des Sel 2010; 23:1-8. [PMID: 19897542 DOI: 10.1093/protein/gzp063] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
USER friendly DNA recombination (USERec) is introduced as a near homology-independent method that allows the simultaneous recombination of an unprecedented number of 10 DNA fragments (approximately 40-400 bp) within a day. The large number of fragments and their ease of preparation enables the creation of libraries of much larger genetic diversity (potentially approximately 10(10)-10(11) sequences) than current alternative methods based on DNA truncation (ITCHY, SCRATCHY and SHIPREC) or type IIb restriction enzymes (SISDC). At the same time, the frequency of frameshifts in the recombined library is low (90% of the recombined sequences are in frame). Compared to overlap extension PCR, USERec also requires much reduced crossover sequence constraints (only a 5'-AN(4-8)T-3' motif) and fewer experimental steps. Based on its simplicity and flexibility, and the accessibility of large and high quality recombined DNA libraries, USERec is established as a convenient alternative for the combinatorial assembly of gene fragments (e.g. exon or domain shuffling) and for a number of applications in gene library construction, such as loop grafting and multi-site-directed or random mutagenesis.
Collapse
Affiliation(s)
- B R M Villiers
- Department of Biochemistry, University of Cambridge, 80 Tennis Court Road, Cambridge CB2 1GA, UK
| | | | | |
Collapse
|
26
|
Buske FA, Their R, Gillam EMJ, Bodén M. In silico characterization of protein chimeras: Relating sequence and function within the same fold. Proteins 2009; 77:111-20. [DOI: 10.1002/prot.22422] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
27
|
Martin CH, Nielsen DR, Solomon KV, Prather KLJ. Synthetic metabolism: engineering biology at the protein and pathway scales. ACTA ACUST UNITED AC 2009; 16:277-86. [PMID: 19318209 DOI: 10.1016/j.chembiol.2009.01.010] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2008] [Revised: 01/21/2009] [Accepted: 01/22/2009] [Indexed: 11/25/2022]
Abstract
Biocatalysis has become a powerful tool for the synthesis of high-value compounds, particularly so in the case of highly functionalized and/or stereoactive products. Nature has supplied thousands of enzymes and assembled them into numerous metabolic pathways. Although these native pathways can be use to produce natural bioproducts, there are many valuable and useful compounds that have no known natural biochemical route. Consequently, there is a need for both unnatural metabolic pathways and novel enzymatic activities upon which these pathways can be built. Here, we review the theoretical and experimental strategies for engineering synthetic metabolic pathways at the protein and pathway scales, and highlight the challenges that this subfield of synthetic biology currently faces.
Collapse
Affiliation(s)
- Collin H Martin
- Department of Chemical Engineering, Synthetic Biology Engineering Research Center, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | | | | | | |
Collapse
|
28
|
Lee J, Natarajan M, Nashine VC, Socolich M, Vo T, Russ WP, Benkovic SJ, Ranganathan R. Surface sites for engineering allosteric control in proteins. Science 2008; 322:438-42. [PMID: 18927392 PMCID: PMC3071530 DOI: 10.1126/science.1159052] [Citation(s) in RCA: 274] [Impact Index Per Article: 17.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Statistical analyses of protein families reveal networks of coevolving amino acids that functionally link distantly positioned functional surfaces. Such linkages suggest a concept for engineering allosteric control into proteins: The intramolecular networks of two proteins could be joined across their surface sites such that the activity of one protein might control the activity of the other. We tested this idea by creating PAS-DHFR, a designed chimeric protein that connects a light-sensing signaling domain from a plant member of the Per/Arnt/Sim (PAS) family of proteins with Escherichia coli dihydrofolate reductase (DHFR). With no optimization, PAS-DHFR exhibited light-dependent catalytic activity that depended on the site of connection and on known signaling mechanisms in both proteins. PAS-DHFR serves as a proof of concept for engineering regulatory activities into proteins through interface design at conserved allosteric sites.
Collapse
Affiliation(s)
- Jeeyeon Lee
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
| | - Madhusudan Natarajan
- Green Center for Systems Biology and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Vishal C. Nashine
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
| | - Michael Socolich
- Green Center for Systems Biology and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tina Vo
- Green Center for Systems Biology and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - William P. Russ
- Green Center for Systems Biology and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Stephen J. Benkovic
- Department of Chemistry, Pennsylvania State University, University Park, PA 16802, USA
| | - Rama Ranganathan
- Green Center for Systems Biology and Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
29
|
Meyer MM, Hiraga K, Arnold FH. Combinatorial recombination of gene fragments to construct a library of chimeras. ACTA ACUST UNITED AC 2008; Chapter 26:26.2.1-26.2.17. [PMID: 18429308 DOI: 10.1002/0471140864.ps2602s44] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Recombination of distantly related and nonrelated genes is difficult using traditional PCR-based techniques, and truncation-based methods result in a large proportion of nonviable sequences due to frame shifts, deletions, and insertions. This unit describes a method for creating libraries of chimeras through combinatorial assembly of gene fragments. It allows the experimenter to recombine genes of any identity and to select the sites where recombination takes place. Combinatorial recombination is achieved by generating gene fragments with specific overhangs, or sticky ends. The overhangs permit the fragments to be ligated in the correct order while allowing independent assortment of blocks with identical overhangs. Genes of any identity can be recombined so long as they share 3 to 5 base pairs of identity at the desired recombination sites. Simple adaptations of the method allow incorporation of specific gene fragments.
Collapse
|
30
|
Armstrong KA, Tidor B. Computationally mapping sequence space to understand evolutionary protein engineering. Biotechnol Prog 2007; 24:62-73. [PMID: 18020358 DOI: 10.1021/bp070134h] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Evolutionary protein engineering has been dramatically successful, producing a wide variety of new proteins with altered stability, binding affinity, and enzymatic activity. However, the success of such procedures is often unreliable, and the impact of the choice of protein, engineering goal, and evolutionary procedure is not well understood. We have created a framework for understanding aspects of the protein engineering process by computationally mapping regions of feasible sequence space for three small proteins using structure-based design protocols. We then tested the ability of different evolutionary search strategies to explore these sequence spaces. The results point to a non-intuitive relationship between the error-prone PCR mutation rate and the number of rounds of replication. The evolutionary relationships among feasible sequences reveal hub-like sequences that serve as particularly fruitful starting sequences for evolutionary search. Moreover, genetic recombination procedures were examined, and tradeoffs relating sequence diversity and search efficiency were identified. This framework allows us to consider the impact of protein structure on the allowed sequence space and therefore on the challenges that each protein presents to error-prone PCR and genetic recombination procedures.
Collapse
Affiliation(s)
- Kathryn A Armstrong
- Computer Science and Artificial Intelligence Laboratory, Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139-4307, USA
| | | |
Collapse
|
31
|
Wong TS, Roccatano D, Schwaneberg U. Steering directed protein evolution: strategies to manage combinatorial complexity of mutant libraries. Environ Microbiol 2007; 9:2645-59. [DOI: 10.1111/j.1462-2920.2007.01411.x] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
32
|
Chaparro-Riggers JF, Polizzi KM, Bommarius AS. Better library design: data-driven protein engineering. Biotechnol J 2007; 2:180-91. [PMID: 17183506 DOI: 10.1002/biot.200600170] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Data-driven protein engineering is increasingly used as an alternative to rational design and combinatorial engineering because it uses available knowledge to limit library size, while still allowing for the identification of unpredictable substitutions that lead to large effects. Recent advances in computational modeling and bioinformatics, as well as an increasing databank of experiments on functional variants, have led to new strategies to choose particular amino acid residues to vary in order to increase the chances of obtaining a variant protein with the desired property. Strategies for limiting diversity at each position, design of small sub-libraries, and the performance of scouting experiments, have also been developed or even automated, further reducing the library size.
Collapse
Affiliation(s)
- Javier F Chaparro-Riggers
- School of Chemical and Biomolecular Engineering, Parker H. Petit Institute of Bioengineering and Bioscience, Atlanta, GA, USA
| | | | | |
Collapse
|
33
|
Meyer MM, Hochrein L, Arnold FH. Structure-guided SCHEMA recombination of distantly related β-lactamases. Protein Eng Des Sel 2006; 19:563-70. [PMID: 17090554 DOI: 10.1093/protein/gzl045] [Citation(s) in RCA: 71] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
We constructed a library of beta-lactamases by recombining three naturally occurring homologs (TEM-1, PSE-4, SED-1) that share 34-42% sequence identity. Most chimeras created by recombining such distantly related proteins are unfolded due to unfavorable side-chain interactions that destabilize the folded structure. To enhance the fraction of properly folded chimeras, we designed the library using SCHEMA, a structure-guided approach to choosing the least disruptive crossover locations. Recombination at seven selected crossover positions generated 6561 chimeric sequences that differ from their closest parent at an average of 66 positions. Of 553 unique characterized chimeras, 111 (20%) retained beta-lactamase activity; the library contains hundreds more novel beta-lactamases. The functional chimeras share as little as 70% sequence identity with any known sequence and are characterized by low SCHEMA disruption (E) compared to the average nonfunctional chimera. Furthermore, many nonfunctional chimeras with low E are readily rescued by low error-rate random mutagenesis or by the introduction of a known stabilizing mutation (TEM-1 M182T). These results show that structure-guided recombination effectively generates a family of diverse, folded proteins even when the parents exhibit only 34% sequence identity. Furthermore, the fraction of sequences that encode folded and functional proteins can be enhanced by utilizing previously stabilized parental sequences.
Collapse
Affiliation(s)
- Michelle M Meyer
- Biochemistry and Molecular Biophysics, California Institute of Technology Mail Code 210-21, California Institute of Technology Mail Code 210-41, Pasadena, CA 91125, USA
| | | | | |
Collapse
|
34
|
Bauer DC, Bodén M, Thier R, Gillam EM. STAR: predicting recombination sites from amino acid sequence. BMC Bioinformatics 2006; 7:437. [PMID: 17026775 PMCID: PMC1624854 DOI: 10.1186/1471-2105-7-437] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/29/2006] [Accepted: 10/08/2006] [Indexed: 11/12/2022] Open
Abstract
Background Designing novel proteins with site-directed recombination has enormous prospects. By locating effective recombination sites for swapping sequence parts, the probability that hybrid sequences have the desired properties is increased dramatically. The prohibitive requirements for applying current tools led us to investigate machine learning to assist in finding useful recombination sites from amino acid sequence alone. Results We present STAR, Site Targeted Amino acid Recombination predictor, which produces a score indicating the structural disruption caused by recombination, for each position in an amino acid sequence. Example predictions contrasted with those of alternative tools, illustrate STAR'S utility to assist in determining useful recombination sites. Overall, the correlation coefficient between the output of the experimentally validated protein design algorithm SCHEMA and the prediction of STAR is very high (0.89). Conclusion STAR allows the user to explore useful recombination sites in amino acid sequences with unknown structure and unknown evolutionary origin. The predictor service is available from .
Collapse
Affiliation(s)
- Denis C Bauer
- Institute for Molecular Bioscience, The University of Queensland, QLD 4072, Australia
| | - Mikael Bodén
- School of Information Technology and Electrical Engineering, The University of Queensland, QLD 4072, Australia
| | - Ricarda Thier
- School of Biomedical Sciences, The University of Queensland, QLD 4072, Australia
| | - Elizabeth M Gillam
- School of Biomedical Sciences, The University of Queensland, QLD 4072, Australia
| |
Collapse
|
35
|
Dubey A, Realff MJ, Lee JH, Bommarius AS. Identifying the interacting positions of a protein using Boolean learning and support vector machines. Comput Biol Chem 2006; 30:268-79. [PMID: 16861039 DOI: 10.1016/j.compbiolchem.2006.04.001] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2005] [Revised: 04/05/2006] [Accepted: 04/06/2006] [Indexed: 11/30/2022]
Abstract
It is known that in the three-dimensional structure of a protein, certain amino acids can interact with each other in order to provide structural integrity or aid in its catalytic function. If these positions are mutated the loss of this interaction usually leads to a non-functional protein. Directed evolution experiments, which probe the sequence space of a protein through mutations in search for an improved variant, frequently result in such inactive sequences. In this work, we address the use of machine learning algorithms, Boolean learning and support vector machines (SVMs), to find such pairs of amino acid positions. The recombination method of imparting mutations was simulated to create in silico sequences that were used as training data for the algorithms. The two algorithms were combined together to develop an approach that weighs the structural risk as well as the empirical risk to solve the problem. This strategy was adapted to a multi-round framework of experiments where the data generated in the present round is used to design experiments for the next round to improve the generated library, as well as the estimation of the interacting positions. It is observed that this strategy can greatly improve the number of functional variants that are generated as well as the average number of mutations that can be made in the library.
Collapse
Affiliation(s)
- Anshul Dubey
- School of Chemical and Biomolecular Engineering, 311 Ferst Drive, Atlanta, GA 30332, United States
| | | | | | | |
Collapse
|
36
|
Otey CR, Landwehr M, Endelman JB, Hiraga K, Bloom JD, Arnold FH. Structure-guided recombination creates an artificial family of cytochromes P450. PLoS Biol 2006; 4:e112. [PMID: 16594730 PMCID: PMC1431580 DOI: 10.1371/journal.pbio.0040112] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/28/2005] [Accepted: 02/09/2006] [Indexed: 11/19/2022] Open
Abstract
Creating artificial protein families affords new opportunities to explore the determinants of structure and biological function free from many of the constraints of natural selection. We have created an artificial family comprising 3,000 P450 heme proteins that correctly fold and incorporate a heme cofactor by recombining three cytochromes P450 at seven crossover locations chosen to minimize structural disruption. Members of this protein family differ from any known sequence at an average of 72 and by as many as 109 amino acids. Most (>73%) of the properly folded chimeric P450 heme proteins are catalytically active peroxygenases; some are more thermostable than the parent proteins. A multiple sequence alignment of 955 chimeras, including both folded and not, is a valuable resource for sequence-structure-function studies. Logistic regression analysis of the multiple sequence alignment identifies key structural contributions to cytochrome P450 heme incorporation and peroxygenase activity and suggests possible structural differences between parents CYP102A1 and CYP102A2.
Collapse
Affiliation(s)
- Christopher R Otey
- 1Biochemistry and Molecular Biophysics, California Institute of Technology, Pasadena, California, United States of America
| | - Marco Landwehr
- 2Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Jeffrey B Endelman
- 3Bioengineering, California Institute of Technology, Pasadena, California, United States of America
| | - Kaori Hiraga
- 2Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Jesse D Bloom
- 2Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
| | - Frances H Arnold
- 1Biochemistry and Molecular Biophysics, California Institute of Technology, Pasadena, California, United States of America
- 2Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, California, United States of America
- 3Bioengineering, California Institute of Technology, Pasadena, California, United States of America
| |
Collapse
|
37
|
Saraf MC, Moore GL, Goodey NM, Cao VY, Benkovic SJ, Maranas CD. IPRO: an iterative computational protein library redesign and optimization procedure. Biophys J 2006; 90:4167-80. [PMID: 16513775 PMCID: PMC1459523 DOI: 10.1529/biophysj.105.079277] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A number of computational approaches have been developed to reengineer promising chimeric proteins one at a time through targeted point mutations. In this article, we introduce the computational procedure IPRO (iterative protein redesign and optimization procedure) for the redesign of an entire combinatorial protein library in one step using energy-based scoring functions. IPRO relies on identifying mutations in the parental sequences, which when propagated downstream in the combinatorial library, improve the average quality of the library (e.g., stability, binding affinity, specific activity, etc.). Residue and rotamer design choices are driven by a globally convergent mixed-integer linear programming formulation. Unlike many of the available computational approaches, the procedure allows for backbone movement as well as redocking of the associated ligands after a prespecified number of design iterations. IPRO can also be used, as a limiting case, for the redesign of a single or handful of individual sequences. The application of IPRO is highlighted through the redesign of a 16-member library of Escherichia coli/Bacillus subtilis dihydrofolate reductase hybrids, both individually and through upstream parental sequence redesign, for improving the average binding energy. Computational results demonstrate that it is indeed feasible to improve the overall library quality as exemplified by binding energy scores through targeted mutations in the parental sequences.
Collapse
Affiliation(s)
- Manish C Saraf
- Department of Chemical Engineering, The Pennsylvania State University, University Park, PA 16802, USA
| | | | | | | | | | | |
Collapse
|
38
|
Patrick WM, Firth AE. Strategies and computational tools for improving randomized protein libraries. ACTA ACUST UNITED AC 2005; 22:105-12. [PMID: 16095966 DOI: 10.1016/j.bioeng.2005.06.001] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2005] [Revised: 06/20/2005] [Accepted: 06/21/2005] [Indexed: 11/15/2022]
Abstract
In the last decade, directed evolution has become a routine approach for engineering proteins with novel or altered properties. Concurrently, a trend away from purely 'blind' randomization strategies and towards more 'semi-rational' approaches has also become apparent. In this review, we discuss ways in which structural information and predictive computational tools are playing an increasingly important role in guiding the design of randomized libraries: web servers such as ConSurf-HSSP and SCHEMA allow the prediction of sites to target for producing functional variants, while algorithms such as GLUE, PEDEL and DRIVeR are useful for estimating library completeness and diversity. In addition, we review recent methodological developments that facilitate the construction of unbiased libraries, which are inherently more diverse than biased libraries and therefore more likely to yield improved variants.
Collapse
Affiliation(s)
- Wayne M Patrick
- Center for Fundamental and Applied Molecular Evolution, Emory University, 1510 Clifton Road, Atlanta GA 30322, USA.
| | | |
Collapse
|
39
|
Lubec G, Afjehi-Sadat L, Yang JW, John JPP. Searching for hypothetical proteins: theory and practice based upon original data and literature. Prog Neurobiol 2005; 77:90-127. [PMID: 16271823 DOI: 10.1016/j.pneurobio.2005.10.001] [Citation(s) in RCA: 133] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2005] [Revised: 09/18/2005] [Accepted: 10/02/2005] [Indexed: 12/29/2022]
Abstract
A large part of mammalian proteomes is represented by hypothetical proteins (HP), i.e. proteins predicted from nucleic acid sequences only and protein sequences with unknown function. Databases are far from being complete and errors are expected. The legion of HP is awaiting experiments to show their existence at the protein level and subsequent bioinformatic handling in order to assign proteins a tentative function is mandatory. Two-dimensional gel-electrophoresis with subsequent mass spectrometrical identification of protein spots is an appropriate tool to search for HP in the high-throughput mode. Spots are identified by MS or by MS/MS measurements (MALDI-TOF, MALDI-TOF-TOF) and subsequent software as e.g. Mascot or ProFound. In many cases proteins can thus be unambiguously identified and characterised; if this is not the case, de novo sequencing or Q-TOF analysis is warranted. If the protein is not identified, the sequence is being sent to databases for BLAST searches to determine identities/similarities or homologies to known proteins. If no significant identity to known structures is observed, the protein sequence is examined for the presence of functional domains (databases PROSITE, PRINTS, InterPro, ProDom, Pfam and SMART), subjected to searches for motifs (ELM) and finally protein-protein interaction databases (InterWeaver, STRING) are consulted or predictions from conformations are performed. We here provide information about hypothetical proteins in terms of protein chemical analysis, independent of antibody availability and specificity and bioinformatic handling to contribute to the extension/completion of protein databases and include original work on HP in the brain to illustrate the processes of HP identification and functional assignment.
Collapse
Affiliation(s)
- Gert Lubec
- Department of Pediatrics, Division of Basic Sciences, Medical University of Vienna, Waehringer Guertel 18-20, A-1090, Vienna, Austria.
| | | | | | | |
Collapse
|
40
|
Hernández G, LeMaster DM. Hybrid native partitioning of interactions among nonconserved residues in chimeric proteins. Proteins 2005; 60:723-31. [PMID: 16021631 DOI: 10.1002/prot.20534] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Given any operational criterion for pairwise interatomic interactions, for a pair of structurally homologous proteins there exists for both proteins a unique equivalent partitioning of the nonconserved residue positions into mutually non-interacting clusters. In the formation of a chimeric protein derived from these two parental sequences, if nonnative-like interactions are to be avoided in its tertiary structure, then all of the nonconserved residues of each cluster must necessarily be either maintained or interchanged simultaneously. This hybrid native partitioning criterion is applied to known gene shuffling results. When the degree of estimated disruption is modest, the HybNat algorithm provides an efficient predictor of structural integrity. This supports the expectation that a substantial fraction of sequences that conform to the hybrid native partitioning criterion will yield tertiary structures that largely preserve the native-like interactions of the parental proteins.
Collapse
Affiliation(s)
- Griselda Hernández
- Wadsworth Center, New York State Department of Health and Department of Biomedical Sciences, University at Albany-SUNY, Empire State Plaza, Albany, New York 12201-0509, USA
| | | |
Collapse
|
41
|
Abstract
In this article we introduce a computational procedure, OPTCOMB (Optimal Pattern of Tiling for COMBinatorial library design), for designing protein hybrid libraries that optimally balance library size with quality. The proposed procedure is directly applicable to oligonucleotide ligation-based protocols such as GeneReassembly, DHR, SISDC, and many more. Given a set of parental sequences and the size ranges of the parental sequence fragments, OPTCOMB determines the optimal junction points (i.e., crossover positions) and the fragment contributing parental sequences at each one of the junction points. By rationally selecting the junction points and the contributing parental sequences, the number of clashes (i.e., unfavorable interactions) in the library is systematically minimized with the aim of improving the overall library quality. Using OPTCOMB, hybrid libraries containing fragments from three different dihydrofolate reductase sequences (Escherichia coli, Bacillus subtilis, and Lactobacillus casei) are computationally designed. Notably, we find that there exists an optimal library size when both the number of clashes between the fragments composing the library and the average number of clashes per hybrid in the library are minimized. Results reveal that the best library designs typically involve complex tiling patterns of parental segments of unequal size hard to infer without relying on computational means.
Collapse
Affiliation(s)
- Manish C Saraf
- Department of Chemical Engineering, The Pennsylvania State University, University Park, Pennsylvania 16082, USA
| | | | | |
Collapse
|
42
|
Dubey A, Realff MJ, Lee JH, Bommarius AS. Support vector machines for learning to identify the critical positions of a protein. J Theor Biol 2005; 234:351-61. [PMID: 15784270 DOI: 10.1016/j.jtbi.2004.11.037] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 11/23/2004] [Indexed: 11/22/2022]
Abstract
A method for identifying the positions in the amino acid sequence, which are critical for the catalytic activity of a protein using support vector machines (SVMs) is introduced and analysed. SVMs are supported by an efficient learning algorithm and can utilize some prior knowledge about the structure of the problem. The amino acid sequences of the variants of a protein, created by inducing mutations, along with their fitness are required as input data by the method to predict its critical positions. To investigate the performance of this algorithm, variants of the beta-lactamase enzyme were created in silico using simulations of both mutagenesis and recombination protocols. Results from literature on beta-lactamase were used to test the accuracy of this method. It was also compared with the results from a simple search algorithm. The algorithm was also shown to be able to predict critical positions that can tolerate two different amino acids and retain function.
Collapse
Affiliation(s)
- Anshul Dubey
- School of Chemical & Biomolecular Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0100, USA
| | | | | | | |
Collapse
|
43
|
Drummond DA, Silberg JJ, Meyer MM, Wilke CO, Arnold FH. On the conservative nature of intragenic recombination. Proc Natl Acad Sci U S A 2005; 102:5380-5. [PMID: 15809422 PMCID: PMC556249 DOI: 10.1073/pnas.0500729102] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2005] [Indexed: 11/18/2022] Open
Abstract
Intragenic recombination rapidly creates protein sequence diversity compared with random mutation, but little is known about the relative effects of recombination and mutation on protein function. Here, we compare recombination of the distantly related beta-lactamases PSE-4 and TEM-1 to mutation of PSE-4. We show that, among beta-lactamase variants containing the same number of amino acid substitutions, variants created by recombination retain function with a significantly higher probability than those generated by random mutagenesis. We present a simple model that accurately captures the differing effects of mutation and recombination in real and simulated proteins with only four parameters: (i) the amino acid sequence distance between parents, (ii) the number of substitutions, (iii) the average probability that random substitutions will preserve function, and (iv) the average probability that substitutions generated by recombination will preserve function. Our results expose a fundamental functional enrichment in regions of protein sequence space accessible by recombination and provide a framework for evaluating whether the relative rates of mutation and recombination observed in nature reflect the underlying imbalance in their effects on protein function.
Collapse
Affiliation(s)
- D Allan Drummond
- Program in Computation and Neural Systems, Division of Chemistry and Chemical Engineering, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | |
Collapse
|
44
|
|
45
|
Deem MW. Evolution and evolvability of proteins in the laboratory. Proc Natl Acad Sci U S A 2004; 101:3997-8. [PMID: 15024102 PMCID: PMC384683 DOI: 10.1073/pnas.0400475101] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Affiliation(s)
- Michael W Deem
- Departments of Bioengineering and Physics & Astronomy, Rice University, 6100 Main Street, MS 142, Houston, TX 77005-1892, USA.
| |
Collapse
|