1
|
Padhi AK, Kalita P, Maurya S, Poluri KM, Tripathi T. From De Novo Design to Redesign: Harnessing Computational Protein Design for Understanding SARS-CoV-2 Molecular Mechanisms and Developing Therapeutics. J Phys Chem B 2023; 127:8717-8735. [PMID: 37815479 DOI: 10.1021/acs.jpcb.3c04542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/11/2023]
Abstract
The continuous emergence of novel SARS-CoV-2 variants and subvariants serves as compelling evidence that COVID-19 is an ongoing concern. The swift, well-coordinated response to the pandemic highlights how technological advancements can accelerate the detection, monitoring, and treatment of the disease. Robust surveillance systems have been established to understand the clinical characteristics of new variants, although the unpredictable nature of these variants presents significant challenges. Some variants have shown resistance to current treatments, but innovative technologies like computational protein design (CPD) offer promising solutions and versatile therapeutics against SARS-CoV-2. Advances in computing power, coupled with open-source platforms like AlphaFold and RFdiffusion (employing deep neural network and diffusion generative models), among many others, have accelerated the design of protein therapeutics with precise structures and intended functions. CPD has played a pivotal role in developing peptide inhibitors, mini proteins, protein mimics, decoy receptors, nanobodies, monoclonal antibodies, identifying drug-resistance mutations, and even redesigning native SARS-CoV-2 proteins. Pending regulatory approval, these designed therapies hold the potential for a lasting impact on human health and sustainability. As SARS-CoV-2 continues to evolve, use of such technologies enables the ongoing development of alternative strategies, thus equipping us for the "New Normal".
Collapse
Affiliation(s)
- Aditya K Padhi
- Laboratory for Computational Biology & Biomolecular Design, School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi 221005, Uttar Pradesh, India
| | - Parismita Kalita
- Molecular and Structural Biophysics Laboratory, Department of Biochemistry, North-Eastern Hill University, Shillong 793022, India
| | - Shweata Maurya
- Laboratory for Computational Biology & Biomolecular Design, School of Biochemical Engineering, Indian Institute of Technology (BHU), Varanasi 221005, Uttar Pradesh, India
| | - Krishna Mohan Poluri
- Department of Biosciences and Bioengineering, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India
- Centre for Nanotechnology, Indian Institute of Technology Roorkee, Roorkee 247667, Uttarakhand, India
| | - Timir Tripathi
- Molecular and Structural Biophysics Laboratory, Department of Biochemistry, North-Eastern Hill University, Shillong 793022, India
- Department of Zoology, School of Life Sciences, North-Eastern Hill University, Shillong 793022, India
| |
Collapse
|
2
|
Gong H, Zhang Y, Dong C, Wang Y, Chen G, Liang B, Li H, Liu L, Xu J, Li G. Unbiased curriculum learning enhanced global-local graph neural network for protein thermodynamic stability prediction. Bioinformatics 2023; 39:btad589. [PMID: 37740312 PMCID: PMC10918760 DOI: 10.1093/bioinformatics/btad589] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Revised: 08/04/2023] [Accepted: 09/21/2023] [Indexed: 09/24/2023] Open
Abstract
MOTIVATION Proteins play crucial roles in biological processes, with their functions being closely tied to thermodynamic stability. However, measuring stability changes upon point mutations of amino acid residues using physical methods can be time-consuming. In recent years, several computational methods for protein thermodynamic stability prediction (PTSP) based on deep learning have emerged. Nevertheless, these approaches either overlook the natural topology of protein structures or neglect the inherent noisy samples resulting from theoretical calculation or experimental errors. RESULTS We propose a novel Global-Local Graph Neural Network powered by Unbiased Curriculum Learning for the PTSP task. Our method first builds a Siamese graph neural network to extract protein features before and after mutation. Since the graph's topological changes stem from local node mutations, we design a local feature transformation module to make the model focus on the mutated site. To address model bias caused by noisy samples, which represent unavoidable errors from physical experiments, we introduce an unbiased curriculum learning method. This approach effectively identifies and re-weights noisy samples during the training process. Extensive experiments demonstrate that our proposed method outperforms advanced protein stability prediction methods, and surpasses state-of-the-art learning methods for regression prediction tasks. AVAILABILITY AND IMPLEMENTATION All code and data is available at https://github.com/haifangong/UCL-GLGNN.
Collapse
Affiliation(s)
- Haifan Gong
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Yumeng Zhang
- Shanghai Jiao Tong University, Shanghai 200000, China
| | - Chenhe Dong
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Yue Wang
- Qilu Hospital, Shandong University, Shandong 250000, China
| | - Guanqi Chen
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| | - Bilin Liang
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Haofeng Li
- SRIBD, Chinese University of Hong Kong (Shenzhen), Shenzhen 518000, China
| | - Lanxuan Liu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Jie Xu
- Shanghai Artificial Intelligence Laboratory, Shanghai 200000, China
| | - Guanbin Li
- School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou 510000, China
| |
Collapse
|
3
|
Rinderspacher BC. Heuristic Global Optimization in Chemical Compound Space. J Phys Chem A 2020; 124:9044-9060. [DOI: 10.1021/acs.jpca.0c05941] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- B. Christopher Rinderspacher
- Materials Discovery and Technology Branch, US Army Research Laboratory, Aberdeen Proving Ground, Maryland 21005, United States
| |
Collapse
|
4
|
Cui H, Stadtmüller THJ, Jiang Q, Jaeger K, Schwaneberg U, Davari MD. How to Engineer Organic Solvent Resistant Enzymes: Insights from Combined Molecular Dynamics and Directed Evolution Study. ChemCatChem 2020. [DOI: 10.1002/cctc.202000422] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Affiliation(s)
- Haiyang Cui
- Lehrstuhl für Biotechnologie RWTH Aachen University Worringerweg 3 52074 Aachen Germany
| | - Tom H. J. Stadtmüller
- Lehrstuhl für Biotechnologie RWTH Aachen University Worringerweg 3 52074 Aachen Germany
| | - Qianjia Jiang
- Lehrstuhl für Biotechnologie RWTH Aachen University Worringerweg 3 52074 Aachen Germany
| | - Karl‐Erich Jaeger
- Institute of Molecular Enzyme Technology Heinrich Heine University Düsseldorf and Research Center Jülich Wilhelm Johnen Strasse 52426 Jülich Germany
| | - Ulrich Schwaneberg
- Lehrstuhl für Biotechnologie RWTH Aachen University Worringerweg 3 52074 Aachen Germany
- DWI-Leibniz Institute for Interactive Materials Forckenbeckstraße 50 52074 Aachen Germany
| | - Mehdi D. Davari
- Lehrstuhl für Biotechnologie RWTH Aachen University Worringerweg 3 52074 Aachen Germany
| |
Collapse
|
5
|
Baker SL, Murata H, Kaupbayeva B, Tasbolat A, Matyjaszewski K, Russell AJ. Charge-Preserving Atom Transfer Radical Polymerization Initiator Rescues the Lost Function of Negatively Charged Protein–Polymer Conjugates. Biomacromolecules 2019; 20:2392-2405. [DOI: 10.1021/acs.biomac.9b00379] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]
Affiliation(s)
| | | | | | - Adina Tasbolat
- Department of Chemistry and Chemical Technology, Al-Farabi Kazakh National University, 71 Al-Farabi Avenue, Almaty 050040, Republic of Kazakhstan
| | | | | |
Collapse
|
6
|
Affiliation(s)
- Valerie Vaissier Welborn
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, California 94720, United States
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
| | - Teresa Head-Gordon
- Kenneth S. Pitzer Center for Theoretical Chemistry and Department of Chemistry, University of California, Berkeley, California 94720, United States
- Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, United States
- Department of Chemical and Biomolecular Engineering and Department of Bioengineering, University of California, Berkeley, California 94720, United States
| |
Collapse
|
7
|
Ranbhor R, Kumar A, Tendulkar A, Patel K, Ramakrishnan V, Durani S. IDeAS: automated design tool for hetero-chiral protein folds. Phys Biol 2018; 15:066005. [PMID: 29923499 DOI: 10.1088/1478-3975/aacdc3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
Incorporating D amino acids in the protein design alphabet can in principle multiply the design space by many orders of magnitude. All native proteins are polymers composed of L chiral amino acids. Practically limitless in diversity over amino acid sequences, protein structure is limited in folds and thus shapes, principally due to the poly L stereochemistry of their backbone. To diversify shapes, we introduced both L- and D α-amino acids as design alphabets to explore the possibility of generating novel folds, varied in chemical as well as stereo-chemical sequence. Now, to have stereochemically-defined proteins tuned chemically, we present the Inverse Design and Automation Software, IDeAS. Retro-fitting side chains on a backbone with L and D stereochemistry, the software demonstrate functional fits over stereo-chemically diverse folds in a range of applications of interest in protein design.
Collapse
Affiliation(s)
- Ranjit Ranbhor
- Department of Biosciences and Bioengineering, Indian Institute of Technology Bombay, Mumbai 400076, India
| | | | | | | | | | | |
Collapse
|
8
|
Kowalski AE, Huber TR, Ni TW, Hartje LF, Appel KL, Yost JW, Ackerson CJ, Snow CD. Gold nanoparticle capture within protein crystal scaffolds. NANOSCALE 2016; 8:12693-12696. [PMID: 27264210 DOI: 10.1039/c6nr03096c] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
DNA assemblies have been used to organize inorganic nanoparticles into 3D arrays, with emergent properties arising as a result of nanoparticle spacing and geometry. We report here the use of engineered protein crystals as an alternative approach to biologically mediated assembly of inorganic nanoparticles. The protein crystal's 13 nm diameter pores result in an 80% solvent content and display hexahistidine sequences on their interior. The hexahistidine sequence captures Au25(glutathione)∼17 (nitrilotriacetic acid)∼1 nanoclusters throughout a chemically crosslinked crystal via the coordination of Ni(ii) to both the cluster and the protein. Nanoparticle loading was validated by confocal microscopy and elemental analysis. The nanoparticles may be released from the crystal by exposure to EDTA, which chelates the Ni(ii) and breaks the specific protein/nanoparticle interaction. The integrity of the protein crystals after crosslinking and nanoparticle capture was confirmed by single crystal X-ray crystallography.
Collapse
Affiliation(s)
- Ann E Kowalski
- Department of Chemical and Biological Engineering, Colorado State University, Fort Collins, CO 80521, USA.
| | | | | | | | | | | | | | | |
Collapse
|
9
|
Koh SK, Ananthasuresh GK, Vishveshwara S. A Deterministic Optimization Approach to Protein Sequence Design Using Continuous Models. Int J Rob Res 2016. [DOI: 10.1177/0278364905050354] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Determining the sequence of amino acid residues in a heteropolymer chain of a protein with a given conformation is a discrete combinatorial problem that is not generally amenable for gradient-based continuous optimization algorithms. In this paper we present a new approach to this problem using continuous models. In this modeling, continuous “state functions” are proposed to designate the type of each residue in the chain. Such a continuous model helps define a continuous sequence space in which a chosen criterion is optimized to find the most appropriate sequence. Searching a continuous sequence space using a deterministic optimization algorithm makes it possible to find the optimal sequences with much less computation than many other approaches. The computational efficiency of this method is further improved by combining it with a graph spectral method, which explicitly takes into account the topology of the desired conformation and also helps make the combined method more robust. The continuous modeling used here appears to have additional advantages in mimicking the folding pathways and in creating the energy landscapes that help find sequences with high stability and kinetic accessibility. To illustrate the new approach, a widely used simplifying assumption is made by considering only two types of residues: hydrophobic (H) and polar (P). Self-avoiding compact lattice models are used to validate the method with known results in the literature and data that can be practically obtained by exhaustive enumeration on a desktop computer. We also present examples of sequence design for the HP models of some real proteins, which are solved in less than five minutes on a single-processor desktop computer. Some open issues and future extensions are noted.
Collapse
Affiliation(s)
- Sung K. Koh
- Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, 19104-6315, USA
| | - G. K. Ananthasuresh
- Mechanical Engineering and Applied Mechanics, University of Pennsylvania, Philadelphia, 19104-6315, USA and Mechanical Engineering, Indian Institute of Science, Bangalore 560 012, India,
| | | |
Collapse
|
10
|
Computational tools for epitope vaccine design and evaluation. Curr Opin Virol 2015; 11:103-12. [PMID: 25837467 DOI: 10.1016/j.coviro.2015.03.013] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 03/13/2015] [Accepted: 03/16/2015] [Indexed: 12/15/2022]
Abstract
Rational approaches will be required to develop universal vaccines for viral pathogens such as human immunodeficiency virus, hepatitis C virus, and influenza, for which empirical approaches have failed. The main objective of a rational vaccine strategy is to design novel immunogens that are capable of inducing long-term protective immunity. In practice, this requires structure-based engineering of the target neutralizing epitopes and a quantitative readout of vaccine-induced immune responses. Therefore, computational tools that can facilitate these two areas have played increasingly important roles in rational vaccine design in recent years. Here we review the computational techniques developed for protein structure prediction and antibody repertoire analysis, and demonstrate how they can be applied to the design and evaluation of epitope vaccines.
Collapse
|
11
|
Elward JM, Rinderspacher BC. Smooth heuristic optimization on a complex chemical subspace. Phys Chem Chem Phys 2015; 17:24322-35. [DOI: 10.1039/c5cp02177d] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
In the present work, several heuristic reordering algorithms for deterministic optimization on a combinatorial chemical compound space are evaluated for performance and efficiency.
Collapse
|
12
|
Hwang I, Park S. Computational design of protein therapeutics. DRUG DISCOVERY TODAY. TECHNOLOGIES 2014; 5:e43-8. [PMID: 24981090 DOI: 10.1016/j.ddtec.2008.11.004] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
Abstract
Computation is increasingly used to guide protein therapeutic designs. Some of the potential applications for computational, structure-based protein design include antibody affinity maturation, modulation of protein-protein interaction, stability improvement and minimization of protein aggregation. The versatility of a computational approach is that different biophysical properties can be analyzed on a common framework. Developing a coherent strategy to address various protein engineering objectives will promote synergy and exploration. Advances in computational structural analysis will thus have a transformative impact on how protein therapeutics are engineered in the future.:
Collapse
Affiliation(s)
- Inseong Hwang
- Department of Chemical and Biological Engineering, University at Buffalo, SUNY, Buffalo, NY, 14260, USA
| | - Sheldon Park
- Department of Chemical and Biological Engineering, University at Buffalo, SUNY, Buffalo, NY, 14260, USA.
| |
Collapse
|
13
|
Srivastava KR, Durani S. Design of a zinc-finger hydrolase with a synthetic αββ protein. PLoS One 2014; 9:e96234. [PMID: 24816915 PMCID: PMC4015931 DOI: 10.1371/journal.pone.0096234] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2013] [Accepted: 04/05/2014] [Indexed: 11/18/2022] Open
Abstract
Recent advances in protein design have opened avenues for the creation of artificial enzymes needed for biotechnological and pharmaceutical applications. However, designing efficient enzymes remains an unrealized ambition, as the design must incorporate a catalytic apparatus specific for the desired reaction. Here we present a de novo design approach to evolve a minimal carbonic anhydrase mimic. We followed a step-by-step design of first folding the main chain followed by sequence variation for substrate binding and catalysis. To optimize the fold, we designed an αββ protein based on a Zn-finger. We then inverse-designed the sequences to provide stability to the fold along with flexibility of linker regions to optimize Zn binding and substrate hydrolysis. The resultant peptides were synthesized and assessed for Zn and substrate binding affinity by fluorescence and ITC followed by evaluation of catalytic efficiency with UV-based enzyme kinetic assays. We were successful in mimicking carbonic anhydrase activity in a peptide of twenty two residues, using p-nitrophenyl acetate as a CO2 surrogate. Although our design had modest activity, being a simple structure is an advantage for further improvement in efficiency. Our approach opens a way forward to evolving an efficient biocatalyst for any industrial reaction of interest.
Collapse
Affiliation(s)
| | - Susheel Durani
- Department of Chemistry, Indian Institute of Technology Bombay, Mumbai, India
| |
Collapse
|
14
|
Yu F, Cangelosi VM, Zastrow ML, Tegoni M, Plegaria JS, Tebo AG, Mocny CS, Ruckthong L, Qayyum H, Pecoraro VL. Protein design: toward functional metalloenzymes. Chem Rev 2014; 114:3495-578. [PMID: 24661096 PMCID: PMC4300145 DOI: 10.1021/cr400458x] [Citation(s) in RCA: 340] [Impact Index Per Article: 34.0] [Reference Citation Analysis] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Affiliation(s)
- Fangting Yu
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | | | | | | | | | - Alison G. Tebo
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | | | - Leela Ruckthong
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | - Hira Qayyum
- University of Michigan, Ann Arbor, Michigan 48109, United States
| | | |
Collapse
|
15
|
Huang YM, Bystroff C. Expanded explorations into the optimization of an energy function for protein design. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1176-1187. [PMID: 24384706 PMCID: PMC3919130 DOI: 10.1109/tcbb.2013.113] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Nature possesses a secret formula for the energy as a function of the structure of a protein. In protein design, approximations are made to both the structural representation of the molecule and to the form of the energy equation, such that the existence of a general energy function for proteins is by no means guaranteed. Here, we present new insights toward the application of machine learning to the problem of finding a general energy function for protein design. Machine learning requires the definition of an objective function, which carries with it the implied definition of success in protein design. We explored four functions, consisting of two functional forms, each with two criteria for success. Optimization was carried out by a Monte Carlo search through the space of all variable parameters. Cross-validation of the optimized energy function against a test set gave significantly different results depending on the choice of objective function, pointing to relative correctness of the built-in assumptions. Novel energy cross terms correct for the observed nonadditivity of energy terms and an imbalance in the distribution of predicted amino acids. This paper expands on the work presented at the 2012 ACM-BCB.
Collapse
|
16
|
Ghosh P, Mushtaq AU, Durani S. Computational design by evolving folds and assemblies over the alphabet in l- and d-α-amino acids. RSC Adv 2012. [DOI: 10.1039/c2ra01012g] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
|
17
|
Samish I, MacDermaid CM, Perez-Aguilar JM, Saven JG. Theoretical and Computational Protein Design. Annu Rev Phys Chem 2011; 62:129-49. [DOI: 10.1146/annurev-physchem-032210-103509] [Citation(s) in RCA: 119] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Affiliation(s)
| | | | | | - Jeffery G. Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, Pennsylvania 19104;
| |
Collapse
|
18
|
Zinc-finger hydrolase: Computational selection of a linker and a sequence towards metal activation with a synthetic αββ protein. Bioorg Med Chem 2010; 18:8270-6. [PMID: 21035349 DOI: 10.1016/j.bmc.2010.10.003] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2010] [Revised: 09/30/2010] [Accepted: 10/02/2010] [Indexed: 12/29/2022]
Abstract
The zinc-finger protein is targeted for computational redesign as a hydrolase enzyme. Successful in having zinc activated for hydrolase function, the study validates the stepwise approach to having the protein tuned in main-chain structure stereochemically and over side chains chemically. A leucine homopolypeptide, harboring histidines to tri coordinate zinc and d-amino-acid-nucleated α-helix and β-hairpin building blocks of an αββ protein, is taken up for modeling, first with cyana, in a mixed-chirality linker between the building blocks, and then with IDeAS, in a sequence over side chains. The designed mixed-chirality polypeptide structure is proven to order as an intended αββ fold and capture zinc to activate its role as a hydrolase catalyst. The design approach to have protein folds defined stereochemically and receptor and catalysis functions defined chemically is presented, and illustrates L- and D-α-amino-acid structures as the alphabet integrating chemical- and stereochemical-structure variables as its letters.
Collapse
|
19
|
Shukla P. Thermodynamics of protein folding: a random matrix formulation. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2010; 22:415106. [PMID: 21386596 DOI: 10.1088/0953-8984/22/41/415106] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
The process of protein folding from an unfolded state to a biologically active, folded conformation is governed by many parameters, e.g. the sequence of amino acids, intermolecular interactions, the solvent, temperature and chaperon molecules. Our study, based on random matrix modeling of the interactions, shows, however, that the evolution of the statistical measures, e.g. Gibbs free energy, heat capacity, and entropy, is single parametric. The information can explain the selection of specific folding pathways from an infinite number of possible ways as well as other folding characteristics observed in computer simulation studies.
Collapse
Affiliation(s)
- Pragya Shukla
- Department of Physics, Indian Institute of Technology, Kharagpur, India
| |
Collapse
|
20
|
Roy L, Case MA. Protein Core Packing by Dynamic Combinatorial Chemistry. J Am Chem Soc 2010; 132:8894-6. [DOI: 10.1021/ja1029717] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Liton Roy
- Department of Chemistry, The University of Vermont, Burlington, Vermont 05405
| | - Martin A. Case
- Department of Chemistry, The University of Vermont, Burlington, Vermont 05405
| |
Collapse
|
21
|
Ji YY, Li YQ. The role of secondary structure in protein structure selection. THE EUROPEAN PHYSICAL JOURNAL. E, SOFT MATTER 2010; 32:103-107. [PMID: 20524028 DOI: 10.1140/epje/i2010-10591-5] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/16/2009] [Revised: 03/10/2010] [Accepted: 04/15/2010] [Indexed: 05/29/2023]
Abstract
The presence of highly regular secondary structure motifs in protein structure is a fascinating area of study. The secondary structures play important roles in protein structure and protein folding. We investigate the folding properties of protein by introducing the effect of secondary structure elements. We observed the emergence of several structures with both large average energy gap and high designability. The dynamic study indicates that these structures are more foldable than those without the effect of secondary structures.
Collapse
Affiliation(s)
- Yong-Yun Ji
- Department of Physics, Wenzhou University, 325035 Wenzhou, PR China.
| | | |
Collapse
|
22
|
Green DF. A Statistical Framework for Hierarchical Methods in Molecular Simulation and Design. J Chem Theory Comput 2010; 6:1682-97. [PMID: 26615700 DOI: 10.1021/ct9004504] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
A statistical framework for performance analysis in hierarchical methods is described, with a focus on applications in molecular design. A theory is derived from statistical principles, describing the relationships between the results of each hierarchical level by a functional correlation and an error model for how values are distributed around the correlation curve. Two key measures are then defined for evaluating a hierarchical approach-completeness and excess cost-conceptually similar to the sensitivity and specificity of dichotomous prediction methods. We demonstrate the use of this method using a simple model problem in conformational search, refining the results of an in vacuo search of glucose conformations with a continuum solvent model. Second, we show the usefulness of this approach when structural hierarchies are used to efficiently make use of large rotamer libraries with the Dead-end Elimination and A* algorithms for protein design. The framework described is applicable not only to the specific examples given but to any problem in molecular simulation or design that involves a hierarchical approach.
Collapse
Affiliation(s)
- David F Green
- Department of Applied Mathematics and Statistics and Graduate Program in Biochemistry and Structural Biology, Stony Brook University, Stony Brook, New York 11794-3600
| |
Collapse
|
23
|
Fry HC, Lehmann A, Saven JG, DeGrado WF, Therien MJ. Computational design and elaboration of a de novo heterotetrameric alpha-helical protein that selectively binds an emissive abiological (porphinato)zinc chromophore. J Am Chem Soc 2010; 132:3997-4005. [PMID: 20192195 PMCID: PMC2856663 DOI: 10.1021/ja907407m] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
The first example of a computationally de novo designed protein that binds an emissive abiological chromophore is presented, in which a sophisticated level of cofactor discrimination is pre-engineered. This heterotetrameric, C(2)-symmetric bundle, A(His):B(Thr), uniquely binds (5,15-di[(4-carboxymethyleneoxy)phenyl]porphinato)zinc [(DPP)Zn] via histidine coordination and complementary noncovalent interactions. The A(2)B(2) heterotetrameric protein reflects ligand-directed elements of both positive and negative design, including hydrogen bonds to second-shell ligands. Experimental support for the appropriate formulation of [(DPP)Zn:A(His):B(Thr)](2) is provided by UV/visible and circular dichroism spectroscopies, size exclusion chromatography, and analytical ultracentrifugation. Time-resolved transient absorption and fluorescence spectroscopic data reveal classic excited-state singlet and triplet PZn photophysics for the A(His):B(Thr):(DPP)Zn protein (k(fluorescence) = 4 x 10(8) s(-1); tau(triplet) = 5 ms). The A(2)B(2) apoprotein has immeasurably low binding affinities for related [porphinato]metal chromophores that include a (DPP)Fe(III) cofactor and the zinc metal ion hemin derivative [(PPIX)Zn], underscoring the exquisite active-site binding discrimination realized in this computationally designed protein. Importantly, elements of design in the A(His):B(Thr) protein ensure that interactions within the tetra-alpha-helical bundle are such that only the heterotetramer is stable in solution; corresponding homomeric bundles present unfavorable ligand-binding environments and thus preclude protein structural rearrangements that could lead to binding of (porphinato)iron cofactors.
Collapse
Affiliation(s)
- H. Christopher Fry
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323
| | - Andreas Lehmann
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323
| | - Jeffrey G. Saven
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323
| | - William F. DeGrado
- Department of Chemistry, University of Pennsylvania, Philadelphia, PA 19104-6323
- Department of Biochemistry and Molecular Biophysics, Johnson Foundation, School of Medicine, University of Pennsylvania, Philadelphia, PA 19104-6059
| | | |
Collapse
|
24
|
|
25
|
Picataggio S. Potential impact of synthetic biology on the development of microbial systems for the production of renewable fuels and chemicals. Curr Opin Biotechnol 2009; 20:325-9. [DOI: 10.1016/j.copbio.2009.04.003] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2009] [Revised: 04/27/2009] [Accepted: 04/28/2009] [Indexed: 11/29/2022]
|
26
|
Leemhuis H, Nightingale KP, Hollfelder F. Directed evolution of a histone acetyltransferase--enhancing thermostability, whilst maintaining catalytic activity and substrate specificity. FEBS J 2008; 275:5635-47. [PMID: 18959749 DOI: 10.1111/j.1742-4658.2008.06689.x] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Histone acetylation plays an integral role in the epigenetic regulation of gene expression. Transcriptional activity reflects the recruitment of opposing classes of enzymes to promoter elements; histone acetyltransferases (EC 2.3.1.48) that deposit acetyl marks at a subset of histone residues and histone deacetylases that remove them. Many histone acetyltransferases are difficult to study in solution because of their limited stability once purified. We have developed a directed evolution protocol that allows the screening of hundreds of histone acetyltransferase mutants for histone acetylating activity, and used this to enhance the thermostability of the human P/CAF histone acetyltransferase. Two rounds of directed evolution significantly stabilized the enzyme without lowering the catalytic efficiency and substrate specificity of the enzyme. Twenty-four variants with higher thermostability were identified. Detailed analysis revealed twelve single amino acid mutants that were found to possess a higher thermostability. The residues affected are scattered over the entire protein structure, and are different from mutations predicted by sequence alignment approaches, suggesting that sequence comparison and directed evolution methods are complementary strategies in engineering increased protein thermostability. The stabilizing mutations are predominately located at surface of the enzyme, suggesting that the protein's surface is important for stability. The directed evolution approach described in the present study is easily adapted to other histone modifying enzymes, requiring only appropriate peptide substrates and antibodies, which are available from commercial suppliers.
Collapse
Affiliation(s)
- Hans Leemhuis
- Department of Biochemistry, University of Cambridge, UK
| | | | | |
Collapse
|
27
|
Butts CA, Swift J, Kang SG, Di Costanzo L, Christianson DW, Saven JG, Dmochowski IJ. Directing Noble Metal Ion Chemistry within a Designed Ferritin Protein,. Biochemistry 2008; 47:12729-39. [DOI: 10.1021/bi8016735] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Affiliation(s)
- Christopher A. Butts
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| | - Joe Swift
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| | - Seung-gu Kang
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| | - Luigi Di Costanzo
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| | - David W. Christianson
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| | - Jeffery G. Saven
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| | - Ivan J. Dmochowski
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323
| |
Collapse
|
28
|
Chen H, Kihara D. Estimating quality of template-based protein models by alignment stability. Proteins 2008; 71:1255-74. [PMID: 18041762 DOI: 10.1002/prot.21819] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The error in protein tertiary structure prediction is unavoidable, but it is not explicitly shown in most of the current prediction algorithms. Estimated error of a predicted structure is crucial information for experimental biologists to use the prediction model for design and interpretation of experiments. Here, we propose a method to estimate errors in predicted structures based on the stability of the optimal target-template alignment when compared with a set of suboptimal alignments. The stability of the optimal alignment is quantified by an index named the SuboPtimal Alignment Diversity (SPAD). We implemented SPAD in a profile-based threading algorithm and investigated how well SPAD can indicate errors in threading models using a large benchmark dataset of 5232 alignments. SPAD shows a very good correlation not only to alignment shift errors but also structure-level errors, the root mean square deviation (RMSD) of predicted structure models to the native structures (i.e. global errors), and local errors at each residue position. We have further compared SPAD with seven other quality measures, six from sequence alignment-based measures and one atomic statistical potential, discrete optimized protein energy (DOPE), in terms of the correlation coefficient to the global and local structure-level errors. In terms of the correlation to the RMSD of structure models, when a target and a template are in the same SCOP family, the sequence identity showed a best correlation to the RMSD; in the superfamily level, SPAD was the best; and in the fold level, DOPE was best. However, in a head-to-head comparison, SPAD wins over the other measures. Next, SPAD is compared with three other measures of local errors. In this comparison, SPAD was best in all of the family, the superfamily and the fold levels. Using the discovered correlation, we have also predicted the global and local error of our predicted structures of CASP7 targets by the SPAD. Finally, we proposed a sausage representation of predicted tertiary structures which intuitively indicate the predicted structure and the estimated error range of the structure simultaneously.
Collapse
Affiliation(s)
- Hao Chen
- Department of Biological Sciences, College of Science, Purdue University, West Lafayette, Indiana 47907, USA
| | | |
Collapse
|
29
|
Abstract
MOTIVATION The task of engineering a protein to perform a target biological function is known as protein design. A commonly used paradigm casts this functional design problem as a structural one, assuming a fixed backbone. In probabilistic protein design, positional amino acid probabilities are used to create a random library of sequences to be simultaneously screened for biological activity. Clearly, certain choices of probability distributions will be more successful in yielding functional sequences. However, since the number of sequences is exponential in protein length, computational optimization of the distribution is difficult. RESULTS In this paper, we develop a computational framework for probabilistic protein design following the structural paradigm. We formulate the distribution of sequences for a structure using the Boltzmann distribution over their free energies. The corresponding probabilistic graphical model is constructed, and we apply belief propagation (BP) to calculate marginal amino acid probabilities. We test this method on a large structural dataset and demonstrate the superiority of BP over previous methods. Nevertheless, since the results obtained by BP are far from optimal, we thoroughly assess the paradigm using high-quality experimental data. We demonstrate that, for small scale sub-problems, BP attains identical results to those produced by exact inference on the paradigmatic model. However, quantitative analysis shows that the distributions predicted significantly differ from the experimental data. These findings, along with the excellent performance we observed using BP on the smaller problems, suggest potential shortcomings of the paradigm. We conclude with a discussion of how it may be improved in the future.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel.
| | | |
Collapse
|
30
|
Zhang N, Zeng C. Reference energy extremal optimization: A stochastic search algorithm applied to computational protein design. J Comput Chem 2008; 29:1762-71. [DOI: 10.1002/jcc.20937] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
31
|
Fung HK, Welsh WJ, Floudas CA. Computational De Novo Peptide and Protein Design: Rigid Templates versus Flexible Templates. Ind Eng Chem Res 2008. [DOI: 10.1021/ie071286k] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Affiliation(s)
- Ho Ki Fung
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| | - William J. Welsh
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| | - Christodoulos A. Floudas
- Department of Chemical Engineering, Princeton University, Princeton, New Jersey 08544-5263, and Department of Pharmacology, University of Medicine & Dentistry of New Jersey (UMDNJ), Robert Wood Johnson Medical School, and the Informatics Institute of UMDNJ, Piscataway, New Jersey 08854
| |
Collapse
|
32
|
Abstract
Current fluorescent protein (FP) development strategies are focused on fine-tuning the photophysical properties of blue to yellow variants derived from the Aequorea victoria jellyfish green fluorescent protein (GFP) and on the development of monomeric FPs from other organisms that emit in the yellow-orange to far-red regions of the visible light spectrum. Progress toward these goals has been substantial, and near-infrared emitting FPs may loom over the horizon. The latest efforts in jellyfish variants have resulted in new and improved monomeric BFP, CFP, GFP and YFP variants, and the relentless search for a bright, monomeric and fast-maturing red FP has yielded a host of excellent candidates, although none is yet optimal for all applications. Meanwhile, photoactivatable FPs are emerging as a powerful class of probes for intracellular dynamics and, unexpectedly, as useful tools for the development of superresolution microscopy applications.
Collapse
Affiliation(s)
- Nathan C Shaner
- The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037, USA.
| | | | | |
Collapse
|
33
|
Grigoryan G, Ochoa A, Keating AE. Computing van der Waals energies in the context of the rotamer approximation. Proteins 2007; 68:863-78. [PMID: 17554777 DOI: 10.1002/prot.21470] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The rotamer approximation states that protein side-chain conformations can be described well using a finite set of rotational isomers. This approximation is often applied in the context of computational protein design and structure prediction to reduce the complexity of structural sampling. It is an effective way of reducing the structure space to the most relevant conformations. However, the appropriateness of rotamers for sampling structure space does not imply that a rotamer-based energy landscape preserves any of the properties of the true continuous energy landscape. Specifically, because the energy of a van der Waals interaction can be very sensitive to small changes in atomic separation, meaningful van der Waals energies are particularly difficult to calculate from rotamer-based structures. This presents a problem for computational protein design, where the total energy of a given structure is often represented as a sum of precalculated rigid rotamer self and pair contributions. A common way of addressing this issue is to modify the van der Waals function to reduce its sensitivity to atomic position, but excessive modification may result in a strongly nonphysical potential. Although many different van der Waals modifications have been used in protein design, little is known about which performs best, and why. In this paper, we study 10 ways of computing van der Waals energies under the rotamer approximation, representing four general classes, and compare their performance using a variety of metrics relevant to protein design and native-sequence repacking calculations. Scaling van der Waals radii by anywhere from 85 to 95% gives the best performance. Linearizing and capping the repulsive portion of the potential can give additional improvement, which comes primarily from getting rid of unrealistically large clash energies. On the other hand, continuously minimizing individual rotamer pairs prior to evaluating their interaction works acceptably in native-sequence repacking, but fails in protein design. Additionally, we show that the problem of predicting relevant van der Waals energies from rotamer-based structures is strongly nonpairwise decomposable and hence further modifications of the potential are unlikely to give significant improvement.
Collapse
Affiliation(s)
- Gevorg Grigoryan
- Department of Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
| | | | | |
Collapse
|
34
|
Marguet P, Balagadde F, Tan C, You L. Biology by design: reduction and synthesis of cellular components and behaviour. J R Soc Interface 2007; 4:607-23. [PMID: 17251159 PMCID: PMC2373384 DOI: 10.1098/rsif.2006.0206] [Citation(s) in RCA: 49] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Biological research is experiencing an increasing focus on the application of knowledge rather than on its generation. Thanks to the increased understanding of cellular systems and technological advances, biologists are more frequently asking not only 'how can I understand the structure and behaviour of this biological system?', but also 'how can I apply that knowledge to generate novel functions in different biological systems or in other contexts?' Active pursuit of the latter has nurtured the emergence of synthetic biology. Here, we discuss the motivation behind, and foundational technologies enabling, the development of this nascent field. We examine some early successes and applications while highlighting the challenges involved. Finally, we consider future directions and mention non-scientific considerations that can influence the field's growth.
Collapse
Affiliation(s)
- Philippe Marguet
- Department of Biochemistry, Duke University Medical CenterDurham, NC 27710, USA
| | - Frederick Balagadde
- Department of Bioengineering, Stanford UniversityStanford, CA 94305-9505, USA
| | - Cheemeng Tan
- Department of Biomedical Engineering, Duke UniversityDurham, NC 27708-0320, USA
| | - Lingchong You
- Department of Biomedical Engineering, Duke UniversityDurham, NC 27708-0320, USA
- Institute for Genome Sciences and Policy, Duke University Medical CenterDurham, NC 27710, USA
- Author and address for correspondence: CIEMAS 2345, 101 Science Drive, Durham, NC 27708, USA ()
| |
Collapse
|
35
|
Mizuno T, Murao K, Tanabe Y, Oda M, Tanaka T. Metal-ion-dependent GFP emission in vivo by combining a circularly permutated green fluorescent protein with an engineered metal-ion-binding coiled-coil. J Am Chem Soc 2007; 129:11378-83. [PMID: 17722917 DOI: 10.1021/ja0685102] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]
Abstract
Coordination of metal ions significantly contributes to protein structures and functions. Here we constructed a fusion protein, consisting of a de novo designed, metal-ion-binding, trimeric coiled-coil and a circularly permutated green fluorescent protein (cpGFP), where the fluorescent emission from cpGFP was induced by metal ion coordination to the coiled-coil. A circularly permutated GFP, (191)cpGFP(190), was constructed by connecting the original N- and C-termini of GFP(UV) by a GGSGG linker and cleaving it between Asp(190) and Gly(191). The metal-ion-binding coiled-coil, IZ-HH, was designed to have three alpha-helical structures, with 12 His residues in the hydrophobic core of the coiled-coil structure. IZ-HH exhibited an unfolded structure, whereas it formed the trimeric coiled-coil structure in the presence of divalent metal ions, such as Cu(2+), Ni(2+), or Zn(2+). The fusion protein (191)cpGFP(190)-IZ-HH was constructed, in which (191)cpGFP(190) was inserted between the second and third alpha-helices of IZ-HH. Escherichia coli cells, expressing (191)cpGFP(190)-IZ-HH, exhibited strong fluorescence when the Cu(2+) and Zn(2+) ions were present in the medium, indicating that they passed through the cell membrane and induced the proper folding of the (191)cpGFP(190) domain. This strategy, in which protein function is regulated by a metal-ion-responsive coiled-coil, should be applicable to the design of various metal-ion-responsive, nonnatural proteins that work both in vitro and in vivo.
Collapse
Affiliation(s)
- Toshihisa Mizuno
- Graduate School of Engineering, Nagoya Institute of Technology, Nagoya 466-8555, Japan.
| | | | | | | | | |
Collapse
|
36
|
Meyerguz L, Kleinberg J, Elber R. The network of sequence flow between protein structures. Proc Natl Acad Sci U S A 2007; 104:11627-32. [PMID: 17596339 PMCID: PMC1913895 DOI: 10.1073/pnas.0701393104] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2007] [Indexed: 12/24/2022] Open
Abstract
Sequence-structure relationships in proteins are highly asymmetric because many sequences fold into relatively few structures. What is the number of sequences that fold into a particular protein structure? Is it possible to switch between stable protein folds by point mutations? To address these questions, we compute a directed graph of sequences and structures of proteins, which is based on 2,060 experimentally determined protein shapes from the Protein Data Bank. The directed graph is highly connected at native energies with "sinks" that attract many sequences from other folds. The sinks are rich in beta-sheets. The number of sequences that transition between folds is significantly smaller than the number of sequences retained by their fold. The sequence flow into a particular protein shape from other proteins correlates with the number of sequences that matches this shape in empirically determined genomes. Properties of strongly connected components of the graph are correlated with protein length and secondary structure.
Collapse
Affiliation(s)
- Leonid Meyerguz
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| | - Jon Kleinberg
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| | - Ron Elber
- Department of Computer Science, Cornell University, Ithaca, NY 14853
| |
Collapse
|
37
|
Shah PS, Hom GK, Ross SA, Lassila JK, Crowhurst KA, Mayo SL. Full-sequence computational design and solution structure of a thermostable protein variant. J Mol Biol 2007; 372:1-6. [PMID: 17628593 DOI: 10.1016/j.jmb.2007.06.032] [Citation(s) in RCA: 57] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2007] [Revised: 06/07/2007] [Accepted: 06/11/2007] [Indexed: 11/25/2022]
Abstract
Computational protein design procedures were applied to the redesign of the entire sequence of a 51 amino acid residue protein, Drosophila melanogaster engrailed homeodomain. Various sequence optimization algorithms were compared and two resulting designed sequences were experimentally evaluated. The two sequences differ by 11 mutations and share 22% and 24% sequence identity with the wild-type protein. Both computationally designed proteins were considerably more stable than the naturally occurring protein, with midpoints of thermal denaturation greater than 99 degrees C. The solution structure was determined for one of the two sequences using multidimensional heteronuclear NMR spectroscopy, and the structure was found to closely match the original design template scaffold.
Collapse
Affiliation(s)
- Premal S Shah
- Biochemistry and Molecular Biophysics Option, MC 114-96, California Institute of Technology, Pasadena, CA 91125, USA
| | | | | | | | | | | |
Collapse
|
38
|
Dallüge R, Oschmann J, Birkenmeier O, Lücke C, Lilie H, Rudolph R, Lange C. A tetrapeptide fragment-based design method results in highly stable artificial proteins. Proteins 2007; 68:839-49. [PMID: 17557327 DOI: 10.1002/prot.21493] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/10/2023]
Abstract
Computational protein design has progressed rapidly over the last years. A number of design methods have been proposed and tested. In this paper, we report the successful application of a fragment-based method for protein design. The method uses statistical information on tetrapeptide backbone conformations. The previously published artificial fold of TOP 7 (Kuhlman et al., Science, 2003; 302:1364-1368) was chosen as template. A series of polypeptide sequences were created that were predicted to fold into this target structure. Two of the designed proteins, M5 and M7, were expressed and characterized by fluorescence spectroscopy, circular dichroism and NMR. They showed the hallmarks of well-ordered tertiary structure as well as cooperative folding/unfolding transitions. Furthermore, the two novel proteins were found to be highly stable against temperature and denaturant-induced unfolding.
Collapse
Affiliation(s)
- Roman Dallüge
- Institut für Biotechnologie, Martin-Luther-Universität Halle-Wittenberg, 06099 Halle, Saale, Germany
| | | | | | | | | | | | | |
Collapse
|
39
|
Zhu J, Xie L, Honig B. Structural refinement of protein segments containing secondary structure elements: Local sampling, knowledge-based potentials, and clustering. Proteins 2006; 65:463-79. [PMID: 16927337 DOI: 10.1002/prot.21085] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
In this article, we present an iterative, modular optimization (IMO) protocol for the local structure refinement of protein segments containing secondary structure elements (SSEs). The protocol is based on three modules: a torsion-space local sampling algorithm, a knowledge-based potential, and a conformational clustering algorithm. Alternative methods are tested for each module in the protocol. For each segment, random initial conformations were constructed by perturbing the native dihedral angles of loops (and SSEs) of the segment to be refined while keeping the protein body fixed. Two refinement procedures based on molecular mechanics force fields - using either energy minimization or molecular dynamics - were also tested but were found to be less successful than the IMO protocol. We found that DFIRE is a particularly effective knowledge-based potential and that clustering algorithms that are biased by the DFIRE energies improve the overall results. Results were further improved by adding an energy minimization step to the conformations generated with the IMO procedure, suggesting that hybrid strategies that combine both knowledge-based and physical effective energy functions may prove to be particularly effective in future applications.
Collapse
Affiliation(s)
- Jiang Zhu
- Howard Hughes Medical Institute, Center for Computational Biology and Bioinformatics, Department of Biochemistry and Molecular Biophysics, Columbia University, 1130 St. Nicholas Avenue, Room 815, New York, New York 10032, USA
| | | | | |
Collapse
|
40
|
Abstract
DNA synthesis has become one of the technological bases of a new concept in biology: synthetic biology. The vision of synthetic biology is a systematic, hierarchical design of artificial, biology-inspired systems using robust, standardized, and well-characterized building blocks. The design concept and examples from four fields of application (genetic circuits, protein design, platform technologies, and pathway engineering) are discussed, which demonstrate the usefulness and the promises of synthetic biology. The vision of synthetic biology is to develop complex systems by simplified solutions using available material and knowledge. Synthetic biology also opens a door toward new biomaterials that do not occur in nature.
Collapse
Affiliation(s)
- Jürgen Pleiss
- Institute of Technical Biochemistry, University of Stuttgart, Allmandring 31, 70569 Stuttgart, Germany.
| |
Collapse
|
41
|
Mena MA, Treynor TP, Mayo SL, Daugherty PS. Blue fluorescent proteins with enhanced brightness and photostability from a structurally targeted library. Nat Biotechnol 2006; 24:1569-71. [PMID: 17115054 DOI: 10.1038/nbt1264] [Citation(s) in RCA: 111] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2006] [Accepted: 10/11/2006] [Indexed: 11/09/2022]
Abstract
The utility of blue fluorescent protein (BFP) has been limited by its low quantum yield and rapid photobleaching. A library targeting residues neighboring the chromophore yielded a variant with enhanced quantum yield (0.55 versus 0.34), reduced pH sensitivity and a 40-fold increase in photobleaching half-life. This BFP, named Azurite, is well expressed in bacterial and mammalian cells and extends the palette of fluorescent proteins that can be used for imaging.
Collapse
Affiliation(s)
- Marco A Mena
- Department of Chemical Engineering, University of California, Santa Barbara, California 93106, USA
| | | | | | | |
Collapse
|
42
|
Dufner P, Jermutus L, Minter RR. Harnessing phage and ribosome display for antibody optimisation. Trends Biotechnol 2006; 24:523-9. [PMID: 17000017 DOI: 10.1016/j.tibtech.2006.09.004] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2006] [Revised: 08/24/2006] [Accepted: 09/14/2006] [Indexed: 12/16/2022]
Abstract
Therapeutic antibodies have become a major driving force for the biopharmaceutical industry; therefore, the discovery and development of safe and efficacious antibody leads have become competitive processes. Phage and ribosome display are ideal tools for the generation of such molecules and have already delivered an approved drug as well as a multitude of clinical candidates. Because they are capable of searching billions of antibody variants in tailored combinatorial libraries, they are particularly applicable to potency optimisation. In conjunction with targeted, random or semi-rational mutagenesis strategies, they deliver large panels of potent antibody leads. This review introduces the two technologies, compares them with respect to their use in antibody optimisation and highlights how they can be exploited for the successful and efficient generation of putative drug candidates.
Collapse
Affiliation(s)
- Patrick Dufner
- Cambridge Antibody Technology, Milstein Building, Granta Park, Cambridge CB1 6GH, UK
| | | | | |
Collapse
|
43
|
Sen TZ, Cheng H, Kloczkowski A, Jernigan RL. A Consensus Data Mining secondary structure prediction by combining GOR V and Fragment Database Mining. Protein Sci 2006; 15:2499-506. [PMID: 17001039 PMCID: PMC2242411 DOI: 10.1110/ps.062125306] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/24/2022]
Abstract
The major aim of tertiary structure prediction is to obtain protein models with the highest possible accuracy. Fold recognition, homology modeling, and de novo prediction methods typically use predicted secondary structures as input, and all of these methods may significantly benefit from more accurate secondary structure predictions. Although there are many different secondary structure prediction methods available in the literature, their cross-validated prediction accuracy is generally <80%. In order to increase the prediction accuracy, we developed a novel hybrid algorithm called Consensus Data Mining (CDM) that combines our two previous successful methods: (1) Fragment Database Mining (FDM), which exploits the Protein Data Bank structures, and (2) GOR V, which is based on information theory, Bayesian statistics, and multiple sequence alignments (MSA). In CDM, the target sequence is dissected into smaller fragments that are compared with fragments obtained from related sequences in the PDB. For fragments with a sequence identity above a certain sequence identity threshold, the FDM method is applied for the prediction. The remainder of the fragments are predicted by GOR V. The results of the CDM are provided as a function of the upper sequence identities of aligned fragments and the sequence identity threshold. We observe that the value 50% is the optimum sequence identity threshold, and that the accuracy of the CDM method measured by Q(3) ranges from 67.5% to 93.2%, depending on the availability of known structural fragments with sufficiently high sequence identity. As the Protein Data Bank grows, it is anticipated that this consensus method will improve because it will rely more upon the structural fragments.
Collapse
Affiliation(s)
- Taner Z Sen
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, Iowa 50011-3020, USA.
| | | | | | | |
Collapse
|
44
|
Abstract
The RosettaDesign server identifies low energy amino acid sequences for target protein structures (http://rosettadesign.med.unc.edu). The client provides the backbone coordinates of the target structure and specifies which residues to design. The server returns to the client the sequences, coordinates and energies of the designed proteins. The simulations are performed using the design module of the Rosetta program (RosettaDesign). RosettaDesign uses Monte Carlo optimization with simulated annealing to search for amino acids that pack well on the target structure and satisfy hydrogen bonding potential. RosettaDesign has been experimentally validated and has been used previously to stabilize naturally occurring proteins and design a novel protein structure.
Collapse
Affiliation(s)
| | - Brian Kuhlman
- To whom correspondence should be addressed. Tel: +1 919 843 0188; Fax: +1 919 966 2852;
| |
Collapse
|
45
|
Abstract
Over the past 10 years there has been tremendous success in the area of computational protein design. Protein design software has been used to stabilize proteins, solubilize membrane proteins, design intermolecular interactions, and design new protein structures. A key motivation for these studies is that they test our understanding of protein energetics and structure. De novo design of novel structures is a particularly rigorous test because the protein backbone must be designed in addition to the amino acid side chains. A priori it is not guaranteed that the target backbone is even designable. To address this issue, researchers have developed a variety of methods for generating protein-like scaffolds and for optimizing the protein backbone in conjunction with the amino acid sequence. These protocols have been used to design proteins from scratch and to explore sequence space for naturally occurring protein folds.
Collapse
Affiliation(s)
- Glenn L Butterfoss
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina 27599-7260, USA.
| | | |
Collapse
|
46
|
Kleinman CL, Rodrigue N, Bonnard C, Philippe H, Lartillot N. A maximum likelihood framework for protein design. BMC Bioinformatics 2006; 7:326. [PMID: 16808841 PMCID: PMC1570151 DOI: 10.1186/1471-2105-7-326] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2006] [Accepted: 06/29/2006] [Indexed: 11/21/2022] Open
Abstract
Background The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility. Results We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered. Conclusion Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.
Collapse
Affiliation(s)
- Claudia L Kleinman
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Rodrigue
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Cécile Bonnard
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| | - Hervé Philippe
- Canadian Institute for Advanced Research, Département de Biochimie, Université de Montréal, Montréal, Québec, Canada
| | - Nicolas Lartillot
- Laboratoire d'lnformatique, de Robotique et de Microélectronique de Montpellier, UMR 5506, CNRS-Université de Montpellier 2, 161, rue Ada, 34392 Montpellier Cedex 5, France
| |
Collapse
|
47
|
Swift J, Wehbi WA, Kelly BD, Stowell XF, Saven JG, Dmochowski IJ. Design of Functional Ferritin-Like Proteins with Hydrophobic Cavities. J Am Chem Soc 2006; 128:6611-9. [PMID: 16704261 DOI: 10.1021/ja057069x] [Citation(s) in RCA: 48] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Ferritin four-helix bundle subunits self-assemble to create a stable multimer with a large central hydrophilic cavity where metal ions bind. To explore the versatility of this reaction vessel, computational design was used to generate cavities with increasingly apolar surface areas inside a dodecameric ferritin-like protein, Dps. Cavity mutants, in which as many as 120 surface accessible hydrophilic residues were replaced with hydrophobic amino acids, were shown to still assemble properly using size-exclusion chromatography and dynamic light scattering measurements. Wild-type Dps exhibited highly cooperative subunit folding and assembly, which was monitored by changes in Trp fluorescence and UV circular dichroism. The hydrophobic cavity mutants showed distinctly less cooperative unfolding behavior, with one mutant forming a partially assembled intermediate upon guanidine denaturation. Although the stability of Dps to such denaturation decreased with increasing apolar surface area, all proteins exhibited high melting temperatures, T(m) = 74-90 degrees C. Despite the large number of mutations, near-native ability to mineralize iron was maintained. This work illustrates the versatility of the ferritin scaffold for engineering large protein cavities with novel properties.
Collapse
Affiliation(s)
- Joe Swift
- Department of Chemistry, University of Pennsylvania, 231 South 34th Street, Philadelphia, Pennsylvania 19104-6323, USA
| | | | | | | | | | | |
Collapse
|
48
|
Kell DB. Theodor Bücher Lecture. Metabolomics, modelling and machine learning in systems biology - towards an understanding of the languages of cells. Delivered on 3 July 2005 at the 30th FEBS Congress and the 9th IUBMB conference in Budapest. FEBS J 2006; 273:873-94. [PMID: 16478464 DOI: 10.1111/j.1742-4658.2006.05136.x] [Citation(s) in RCA: 130] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The newly emerging field of systems biology involves a judicious interplay between high-throughput 'wet' experimentation, computational modelling and technology development, coupled to the world of ideas and theory. This interplay involves iterative cycles, such that systems biology is not at all confined to hypothesis-dependent studies, with intelligent, principled, hypothesis-generating studies being of high importance and consequently very far from aimless fishing expeditions. I seek to illustrate each of these facets. Novel technology development in metabolomics can increase substantially the dynamic range and number of metabolites that one can detect, and these can be exploited as disease markers and in the consequent and principled generation of hypotheses that are consistent with the data and achieve this in a value-free manner. Much of classical biochemistry and signalling pathway analysis has concentrated on the analyses of changes in the concentrations of intermediates, with 'local' equations - such as that of Michaelis and Menten v=(Vmax x S)/(S+K m) - that describe individual steps being based solely on the instantaneous values of these concentrations. Recent work using single cells (that are not subject to the intellectually unsupportable averaging of the variable displayed by heterogeneous cells possessing nonlinear kinetics) has led to the recognition that some protein signalling pathways may encode their signals not (just) as concentrations (AM or amplitude-modulated in a radio analogy) but via changes in the dynamics of those concentrations (the signals are FM or frequency-modulated). This contributes in principle to a straightforward solution of the crosstalk problem, leads to a profound reassessment of how to understand the downstream effects of dynamic changes in the concentrations of elements in these pathways, and stresses the role of signal processing (and not merely the intermediates) in biological signalling. It is this signal processing that lies at the heart of understanding the languages of cells. The resolution of many of the modern and postgenomic problems of biochemistry requires the development of a myriad of new technologies (and maybe a new culture), and thus regular input from the physical sciences, engineering, mathematics and computer science. One solution, that we are adopting in the Manchester Interdisciplinary Biocentre (http://www.mib.ac.uk/) and the Manchester Centre for Integrative Systems Biology (http://www.mcisb.org/), is thus to colocate individuals with the necessary combinations of skills. Novel disciplines that require such an integrative approach continue to emerge. These include fields such as chemical genomics, synthetic biology, distributed computational environments for biological data and modelling, single cell diagnostics/bionanotechnology, and computational linguistics/text mining.
Collapse
Affiliation(s)
- Douglas B Kell
- School of Chemistry, Faraday Building, The University of Manchester, UK.
| |
Collapse
|
49
|
Taylor CM, Keating AE. Orientation and oligomerization specificity of the Bcr coiled-coil oligomerization domain. Biochemistry 2006; 44:16246-56. [PMID: 16331985 PMCID: PMC2526250 DOI: 10.1021/bi051493t] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
The Bcr oligomerization domain, from the Bcr-Abl oncoprotein, is an attractive therapeutic target for treating leukemias because it is required for cellular transformation. The domain homodimerizes via an antiparallel coiled coil with an adjacent short, helical swap domain. Inspection of the coiled-coil sequence does not reveal obvious determinants of helix-orientation specificity, raising the possibility that the antiparallel orientation preference and/or the dimeric oligomerization state are due to interactions of the swap domains. To better understand how structural specificity is encoded in Bcr, coiled-coil constructs containing either an N- or C-terminal cysteine were synthesized without the swap domain. When cross-linked to adopt exclusively parallel or antiparallel orientations, these showed similar circular dichroism spectra. Both constructs formed coiled-coil dimers, but the antiparallel construct was approximately 16 degrees C more stable than the parallel to thermal denaturation. Equilibrium disulfide-exchange studies confirmed that the isolated coiled-coil homodimer shows a very strong preference for the antiparallel orientation. We conclude that the orientation and oligomerization preferences of Bcr are not caused by the presence of the swap domains, but rather are directly encoded in the coiled-coil sequence. We further explored possible determinants of structural specificity by mutating residues in the d position of the coiled-coil core. Some of the mutations caused a change in orientation specificity, and all of the mutations led to the formation of higher-order oligomers. In the absence of the swap domain, these residues play an important role in disfavoring alternate states and are especially important for encoding dimeric oligomerization specificity.
Collapse
Affiliation(s)
| | - Amy E. Keating
- * To whom correspondence should be directed. Tel: 617-452-3398. Fax: 617-253-4043 E-mail:
| |
Collapse
|
50
|
Floudas C, Fung H, McAllister S, Mönnigmann M, Rajgaria R. Advances in protein structure prediction and de novo protein design: A review. Chem Eng Sci 2006. [DOI: 10.1016/j.ces.2005.04.009] [Citation(s) in RCA: 175] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|