1
|
Coevolutionary signals across protein lineages help capture multiple protein conformations. Proc Natl Acad Sci U S A 2013; 110:20533-8. [PMID: 24297889 DOI: 10.1073/pnas.1315625110] [Citation(s) in RCA: 131] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
A long-standing problem in molecular biology is the determination of a complete functional conformational landscape of proteins. This includes not only proteins' native structures, but also all their respective functional states, including functionally important intermediates. Here, we reveal a signature of functionally important states in several protein families, using direct coupling analysis, which detects residue pair coevolution of protein sequence composition. This signature is exploited in a protein structure-based model to uncover conformational diversity, including hidden functional configurations. We uncovered, with high resolution (mean ~1.9 Å rmsd for nonapo structures), different functional structural states for medium to large proteins (200-450 aa) belonging to several distinct families. The combination of direct coupling analysis and the structure-based model also predicts several intermediates or hidden states that are of functional importance. This enhanced sampling is broadly applicable and has direct implications in protein structure determination and the design of ligands or drugs to trap intermediate states.
Collapse
|
2
|
Perez-Aguilar JM, Saven JG. Computational design of membrane proteins. Structure 2012; 20:5-14. [PMID: 22244752 DOI: 10.1016/j.str.2011.12.003] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2011] [Revised: 12/21/2011] [Accepted: 12/21/2011] [Indexed: 11/26/2022]
Abstract
Membrane proteins are involved in a wide variety of cellular processes, and are typically part of the first interaction a cell has with extracellular molecules. As a result, these proteins comprise a majority of known drug targets. Membrane proteins are among the most difficult proteins to obtain and characterize, and a structure-based understanding of their properties can be difficult to elucidate. Notwithstanding, the design of membrane proteins can provide stringent tests of our understanding of these crucial biological systems, as well as introduce novel or targeted functionalities. Computational design methods have been particularly helpful in addressing these issues, and this review discusses recent studies that tailor membrane proteins to display specific structures or functions and examines how redesigned membrane proteins are being used to facilitate structural and functional studies.
Collapse
|
3
|
Saven JG. Computational protein design: engineering molecular diversity, nonnatural enzymes, nonbiological cofactor complexes, and membrane proteins. Curr Opin Chem Biol 2011; 15:452-7. [PMID: 21493122 DOI: 10.1016/j.cbpa.2011.03.014] [Citation(s) in RCA: 42] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2011] [Revised: 03/18/2011] [Accepted: 03/18/2011] [Indexed: 11/18/2022]
Abstract
Computational and theoretical methods are advancing protein design as a means to create and investigate proteins. Such efforts further our capacity to control, design and understand biomolecular structure, sequence and function. Herein, the focus is on some recent applications that involve using theoretical and computational methods to guide the design of protein sequence ensembles, new enzymes, proteins with novel cofactors, and membrane proteins.
Collapse
Affiliation(s)
- Jeffery G Saven
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104, USA
| |
Collapse
|
4
|
Saven JG. Computational protein design: Advances in the design and redesign of biomolecular nanostructures. Curr Opin Colloid Interface Sci 2010; 15:13-17. [PMID: 21544231 DOI: 10.1016/j.cocis.2009.06.002] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
Computational protein design facilitates the continued development of methods for the design of biomolecular structure, sequence and function. Recent applications include the design of novel protein sequences and structures, proteins incorporating nonbiological components, protein assemblies, soluble variants of membrane proteins, and proteins that modulate membrane function.
Collapse
Affiliation(s)
- Jeffery G Saven
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104, United States
| |
Collapse
|
5
|
Fromer M, Yanover C, Linial M. Design of multispecific protein sequences using probabilistic graphical modeling. Proteins 2010; 78:530-47. [PMID: 19842166 DOI: 10.1002/prot.22575] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In nature, proteins partake in numerous protein- protein interactions that mediate their functions. Moreover, proteins have been shown to be physically stable in multiple structures, induced by cellular conditions, small ligands, or covalent modifications. Understanding how protein sequences achieve this structural promiscuity at the atomic level is a fundamental step in the drug design pipeline and a critical question in protein physics. One way to investigate this subject is to computationally predict protein sequences that are compatible with multiple states, i.e., multiple target structures or binding to distinct partners. The goal of engineering such proteins has been termed multispecific protein design. We develop a novel computational framework to efficiently and accurately perform multispecific protein design. This framework utilizes recent advances in probabilistic graphical modeling to predict sequences with low energies in multiple target states. Furthermore, it is also geared to specifically yield positional amino acid probability profiles compatible with these target states. Such profiles can be used as input to randomly bias high-throughput experimental sequence screening techniques, such as phage display, thus providing an alternative avenue for elucidating the multispecificity of natural proteins and the synthesis of novel proteins with specific functionalities. We prove the utility of such multispecific design techniques in better recovering amino acid sequence diversities similar to those resulting from millions of years of evolution. We then compare the approaches of prediction of low energy ensembles and of amino acid profiles and demonstrate their complementarity in providing more robust predictions for protein design.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
| | | | | |
Collapse
|
6
|
Craig RA, Lu J, Luo J, Shi L, Liao L. Optimizing nucleotide sequence ensembles for combinatorial protein libraries using a genetic algorithm. Nucleic Acids Res 2009; 38:e10. [PMID: 19889723 PMCID: PMC2811015 DOI: 10.1093/nar/gkp906] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Protein libraries are essential to the field of protein engineering. Increasingly, probabilistic protein design is being used to synthesize combinatorial protein libraries, which allow the protein engineer to explore a vast space of amino acid sequences, while at the same time placing restrictions on the amino acid distributions. To this end, if site-specific amino acid probabilities are input as the target, then the codon nucleotide distributions that match this target distribution can be used to generate a partially randomized gene library. However, it turns out to be a highly nontrivial computational task to find the codon nucleotide distributions that exactly matches a given target distribution of amino acids. We first showed that for any given target distribution an exact solution may not exist at all. Formulated as a constrained optimization problem, we then developed a genetic algorithm-based approach to find codon nucleotide distributions that match as closely as possible to the target amino acid distribution. As compared with the previous gradient descent method on various objective functions, the new method consistently gave more optimized distributions as measured by the relative entropy between the calculated and the target distributions. To simulate the actual lab solutions, new objective functions were designed to allow for two separate sets of codons in seeking a better match to the target amino acid distribution.
Collapse
Affiliation(s)
- Roger A Craig
- Department of Computer and Information Sciences, University of Delaware, Newark, DE 19716, USA
| | | | | | | | | |
Collapse
|
7
|
Fromer M, Yanover C. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space. Proteins 2009; 75:682-705. [PMID: 19003998 DOI: 10.1002/prot.22280] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The task of engineering a protein to assume a target three-dimensional structure is known as protein design. Computational search algorithms are devised to predict a minimal energy amino acid sequence for a particular structure. In practice, however, an ensemble of low-energy sequences is often sought. Primarily, this is performed because an individual predicted low-energy sequence may not necessarily fold to the target structure because of both inaccuracies in modeling protein energetics and the nonoptimal nature of search algorithms employed. Additionally, some low-energy sequences may be overly stable and thus lack the dynamic flexibility required for biological functionality. Furthermore, the investigation of low-energy sequence ensembles will provide crucial insights into the pseudo-physical energy force fields that have been derived to describe structural energetics for protein design. Significantly, numerous studies have predicted low-energy sequences, which were subsequently synthesized and demonstrated to fold to desired structures. However, the characterization of the sequence space defined by such energy functions as compatible with a target structure has not been performed in full detail. This issue is critical for protein design scientists to successfully continue using these force fields at an ever-increasing pace and scale. In this paper, we present a conceptually novel algorithm that rapidly predicts the set of lowest energy sequences for a given structure. Based on the theory of probabilistic graphical models, it performs efficient inspection and partitioning of the near-optimal sequence space, without making any assumptions of positional independence. We benchmark its performance on a diverse set of relevant protein design examples and show that it consistently yields sequences of lower energy than those derived from state-of-the-art techniques. Thus, we find that previously presented search techniques do not fully depict the low-energy space as precisely. Examination of the predicted ensembles indicates that, for each structure, the amino acid identity at a majority of positions must be chosen extremely selectively so as to not incur significant energetic penalties. We investigate this high degree of similarity and demonstrate how more diverse near-optimal sequences can be predicted in order to systematically overcome this bottleneck for computational design. Finally, we exploit this in-depth analysis of a collection of the lowest energy sequences to suggest an explanation for previously observed experimental design results. The novel methodologies introduced here accurately portray the sequence space compatible with a protein structure and further supply a scheme to yield heterogeneous low-energy sequences, thus providing a powerful instrument for future work on protein design.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel.
| | | |
Collapse
|
8
|
Abstract
MOTIVATION The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult. RESULTS In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
| | | |
Collapse
|
9
|
Philibert P, Stoessel A, Wang W, Sibler AP, Bec N, Larroque C, Saven JG, Courtête J, Weiss E, Martineau P. A focused antibody library for selecting scFvs expressed at high levels in the cytoplasm. BMC Biotechnol 2007; 7:81. [PMID: 18034894 PMCID: PMC2241821 DOI: 10.1186/1472-6750-7-81] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2007] [Accepted: 11/22/2007] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Intrabodies are defined as antibody molecules which are ectopically expressed inside the cell. Such intrabodies can be used to visualize or inhibit the targeted antigen in living cells. However, most antibody fragments cannot be used as intrabodies because they do not fold under the reducing conditions of the cell cytosol and nucleus. RESULTS We describe the construction and validation of a large synthetic human single chain antibody fragment library based on a unique framework and optimized for cytoplasmic expression. Focusing the library by mimicking the natural diversity of CDR3 loops ensured that the scFvs were fully human and functional. We show that the library is highly diverse and functional since it has been possible to isolate by phage-display several strong binders against the five proteins tested in this study, the Syk and Aurora-A protein kinases, the alphabeta tubulin dimer, the papillomavirus E6 protein and the core histones. Some of the selected scFvs are expressed at an exceptional high level in the bacterial cytoplasm, allowing the purification of 1 mg of active scFv from only 20 ml of culture. Finally, we show that after three rounds of selection against core histones, more than half of the selected scFvs were active when expressed in vivo in human cells since they were essentially localized in the nucleus. CONCLUSION This new library is a promising tool not only for an easy and large-scale selection of functional intrabodies but also for the isolation of highly expressed scFvs that could be used in numerous biotechnological and therapeutic applications.
Collapse
Affiliation(s)
- Pascal Philibert
- CNRS, UMR5160, CRLC, 15, av, Charles Flahault, BP14491, 34093, Montpellier Cedex 5, France.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
10
|
Chaparro-Riggers JF, Polizzi KM, Bommarius AS. Better library design: data-driven protein engineering. Biotechnol J 2007; 2:180-91. [PMID: 17183506 DOI: 10.1002/biot.200600170] [Citation(s) in RCA: 73] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Data-driven protein engineering is increasingly used as an alternative to rational design and combinatorial engineering because it uses available knowledge to limit library size, while still allowing for the identification of unpredictable substitutions that lead to large effects. Recent advances in computational modeling and bioinformatics, as well as an increasing databank of experiments on functional variants, have led to new strategies to choose particular amino acid residues to vary in order to increase the chances of obtaining a variant protein with the desired property. Strategies for limiting diversity at each position, design of small sub-libraries, and the performance of scouting experiments, have also been developed or even automated, further reducing the library size.
Collapse
Affiliation(s)
- Javier F Chaparro-Riggers
- School of Chemical and Biomolecular Engineering, Parker H. Petit Institute of Bioengineering and Bioscience, Atlanta, GA, USA
| | | | | |
Collapse
|
11
|
Lippow SM, Tidor B. Progress in computational protein design. Curr Opin Biotechnol 2007; 18:305-11. [PMID: 17644370 PMCID: PMC3495006 DOI: 10.1016/j.copbio.2007.04.009] [Citation(s) in RCA: 161] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2007] [Accepted: 04/17/2007] [Indexed: 11/25/2022]
Abstract
Current progress in computational structure-based protein design is reviewed in the areas of methodology and applications. Foundational advances include new potential functions, more efficient ways of computing energetics, flexible treatments of solvent, and useful energy function approximations, as well as ensemble-based approaches to scoring designs for inclusion of entropic effects, improvements to guaranteed and to stochastic search techniques, and methods to design combinatorial libraries for screening and selection. Applications include new approaches and successes in the design of specificity for protein folding, binding, and catalysis, in the redesign of proteins for enhanced binding affinity, and in the application of design technology to study and alter enzyme catalysis. Computational protein design continues to mature and advance.
Collapse
Affiliation(s)
- Shaun M Lippow
- Department of Chemical Engineering, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139, USA.
| | | |
Collapse
|
12
|
Kang SG, Saven JG. Computational protein design: structure, function and combinatorial diversity. Curr Opin Chem Biol 2007; 11:329-34. [PMID: 17524729 DOI: 10.1016/j.cbpa.2007.05.006] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2007] [Accepted: 05/10/2007] [Indexed: 11/26/2022]
Abstract
Computational protein design has blossomed with the development of methods for addressing the complexities involved in specifying the structure, sequence and function of proteins. Recent applications include the design of novel functional membrane and soluble proteins, proteins incorporating non-biological components and protein combinatorial libraries.
Collapse
Affiliation(s)
- Seung-gu Kang
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, PA 19104, USA
| | | |
Collapse
|
13
|
Swift J, Wehbi WA, Kelly BD, Stowell XF, Saven JG, Dmochowski IJ. Design of Functional Ferritin-Like Proteins with Hydrophobic Cavities. J Am Chem Soc 2006; 128:6611-9. [PMID: 16704261 DOI: 10.1021/ja057069x] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ferritin four-helix bundle subunits self-assemble to create a stable multimer with a large central hydrophilic cavity where metal ions bind. To explore the versatility of this reaction vessel, computational design was used to generate cavities with increasingly apolar surface areas inside a dodecameric ferritin-like protein, Dps. Cavity mutants, in which as many as 120 surface accessible hydrophilic residues were replaced with hydrophobic amino acids, were shown to still assemble properly using size-exclusion chromatography and dynamic light scattering measurements. Wild-type Dps exhibited highly cooperative subunit folding and assembly, which was monitored by changes in Trp fluorescence and UV circular dichroism. The hydrophobic cavity mutants showed distinctly less cooperative unfolding behavior, with one mutant forming a partially assembled intermediate upon guanidine denaturation. Although the stability of Dps to such denaturation decreased with increasing apolar surface area, all proteins exhibited high melting temperatures, T(m) = 74-90 degrees C. Despite the large number of mutations, near-native ability to mineralize iron was maintained. This work illustrates the versatility of the ferritin scaffold for engineering large protein cavities with novel properties.
Collapse
Affiliation(s)
- Joe Swift
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323, USA
| | | | | | | | | | | |
Collapse
|
14
|
Kell DB. Theodor Bücher Lecture. Metabolomics, modelling and machine learning in systems biology - towards an understanding of the languages of cells. Delivered on 3 July 2005 at the 30th FEBS Congress and the 9th IUBMB conference in Budapest. FEBS J 2006; 273:873-94. [PMID: 16478464 DOI: 10.1111/j.1742-4658.2006.05136.x] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The newly emerging field of systems biology involves a judicious interplay between high-throughput 'wet' experimentation, computational modelling and technology development, coupled to the world of ideas and theory. This interplay involves iterative cycles, such that systems biology is not at all confined to hypothesis-dependent studies, with intelligent, principled, hypothesis-generating studies being of high importance and consequently very far from aimless fishing expeditions. I seek to illustrate each of these facets. Novel technology development in metabolomics can increase substantially the dynamic range and number of metabolites that one can detect, and these can be exploited as disease markers and in the consequent and principled generation of hypotheses that are consistent with the data and achieve this in a value-free manner. Much of classical biochemistry and signalling pathway analysis has concentrated on the analyses of changes in the concentrations of intermediates, with 'local' equations - such as that of Michaelis and Menten v=(Vmax x S)/(S+K m) - that describe individual steps being based solely on the instantaneous values of these concentrations. Recent work using single cells (that are not subject to the intellectually unsupportable averaging of the variable displayed by heterogeneous cells possessing nonlinear kinetics) has led to the recognition that some protein signalling pathways may encode their signals not (just) as concentrations (AM or amplitude-modulated in a radio analogy) but via changes in the dynamics of those concentrations (the signals are FM or frequency-modulated). This contributes in principle to a straightforward solution of the crosstalk problem, leads to a profound reassessment of how to understand the downstream effects of dynamic changes in the concentrations of elements in these pathways, and stresses the role of signal processing (and not merely the intermediates) in biological signalling. It is this signal processing that lies at the heart of understanding the languages of cells. The resolution of many of the modern and postgenomic problems of biochemistry requires the development of a myriad of new technologies (and maybe a new culture), and thus regular input from the physical sciences, engineering, mathematics and computer science. One solution, that we are adopting in the Manchester Interdisciplinary Biocentre (http://www.mib.ac.uk/) and the Manchester Centre for Integrative Systems Biology (http://www.mcisb.org/), is thus to colocate individuals with the necessary combinations of skills. Novel disciplines that require such an integrative approach continue to emerge. These include fields such as chemical genomics, synthetic biology, distributed computational environments for biological data and modelling, single cell diagnostics/bionanotechnology, and computational linguistics/text mining.
Collapse
Affiliation(s)
- Douglas B Kell
- School of Chemistry, Faraday Building, The University of Manchester, UK.
| |
Collapse
|
15
|
Park S, Xu Y, Stowell XF, Gai F, Saven JG, Boder ET. Limitations of yeast surface display in engineering proteins of high thermostability. Protein Eng Des Sel 2006; 19:211-7. [PMID: 16537642 DOI: 10.1093/protein/gzl003] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Engineering proteins that can fold to unique structures remains a challenge. Protein stability has previously been engineered via the observed correlation between thermal stability and eukaryotic secretion level. To explore the limits of an expression-based approach, variants of the highly thermostable three-helix bundle protein alpha3D were studied using yeast surface display. A library of alpha3D mutants was created to explore the possible correlation of protein stability and fold with expression level. Five efficiently expressed mutants were then purified and further studied biochemically. Despite their differences in stability, most mutants expressed at levels comparable with that of wild-type alpha3D. Two other related sequences (alpha3A and alpha3B) that form collapsed, stable molten globules but lack a uniquely folded structure were similarly expressed at high levels by yeast display. Together these observations suggest that the quality control system in yeast is unable to discriminate between well-folded proteins of high stability and molten globules. The present study, therefore, suggests that an optimization of the surface display efficiency on yeast may yield proteins that are thermally and chemically stable yet are poorly folded.
Collapse
Affiliation(s)
- Sheldon Park
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104, USA
| | | | | | | | | | | |
Collapse
|