1
|
Boral A, Mitra D. Heterogeneity in winged helix-turn-helix and substrate DNA interactions: Insights from theory and experiments. J Cell Biochem 2023; 124:337-358. [PMID: 36715571 DOI: 10.1002/jcb.30369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2022] [Revised: 12/29/2022] [Accepted: 01/02/2023] [Indexed: 01/31/2023]
Abstract
Specific interactions between transcription factors (TFs) and substrate DNA constitute the fundamental basis of gene expression. Unlike in TFs like basic helix-loop-helix or basic leucine zippers, prediction of substrate DNA is extremely challenging for helix-turn-helix (HTH). Experimental techniques like chromatin immunoprecipitation combined with massively parallel DNA sequencing remains a viable option. We characterize the molecular basis of heterogeneity in HTH-DNA interaction using in silico tools and thence validate them experimentally. Given the profound functional diversity in HTH, we focus primarily on winged-HTH (wHTH). We consider 180 wHTH TFs, whose experimental three-dimensional structures are available in DNA bound/unbound conformations. Starting with PDB-wide scanning and curation of data, we construct a phylogenetic tree, which distributes 180 wHTH sequences under multiple sub-groups. Structure-sequence alignment followed by detailed intra/intergroup analysis, covariation studies and extensive network theory analysis help us to gain deep insight into heterogeneous wHTH-substrate DNA interactions. A central aim of this study is to find a consensus to predict the substrate DNA sequence for wHTH, amidst heterogeneity. The strength of our exhaustive theoretical investigations including molecular docking are successfully tested through experimental characterization of wHTH TF from Sulfurimonas denitrificans.
Collapse
Affiliation(s)
- Aparna Boral
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| | - Devrani Mitra
- Department of Life Sciences, Presidency University, Kolkata, West Bengal, India
| |
Collapse
|
2
|
Castorina LV, Petrenas R, Subr K, Wood CW. PDBench: evaluating computational methods for protein-sequence design. Bioinformatics 2023; 39:btad027. [PMID: 36637198 PMCID: PMC9869650 DOI: 10.1093/bioinformatics/btad027] [Citation(s) in RCA: 8] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2022] [Revised: 11/14/2022] [Accepted: 01/12/2023] [Indexed: 01/14/2023] Open
Abstract
SUMMARY Ever increasing amounts of protein structure data, combined with advances in machine learning, have led to the rapid proliferation of methods available for protein-sequence design. In order to utilize a design method effectively, it is important to understand the nuances of its performance and how it varies by design target. Here, we present PDBench, a set of proteins and a number of standard tests for assessing the performance of sequence-design methods. PDBench aims to maximize the structural diversity of the benchmark, compared with previous benchmarking sets, in order to provide useful biological insight into the behaviour of sequence-design methods, which is essential for evaluating their performance and practical utility. We believe that these tools are useful for guiding the development of novel sequence design algorithms and will enable users to choose a method that best suits their design target. AVAILABILITY AND IMPLEMENTATION https://github.com/wells-wood-research/PDBench. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Leonardo V Castorina
- School of Informatics, University of Edinburgh, 10 Crichton Street, Newington, Edinburgh EH8 9AB, UK
| | - Rokas Petrenas
- School of Biological Sciences, University of Edinburgh, Roger Land Building, Edinburgh EH9 3FF, UK
| | - Kartic Subr
- School of Informatics, University of Edinburgh, 10 Crichton Street, Newington, Edinburgh EH8 9AB, UK
| | - Christopher W Wood
- School of Biological Sciences, University of Edinburgh, Roger Land Building, Edinburgh EH9 3FF, UK
| |
Collapse
|
3
|
Roney JP, Ovchinnikov S. State-of-the-Art Estimation of Protein Model Accuracy Using AlphaFold. PHYSICAL REVIEW LETTERS 2022; 129:238101. [PMID: 36563190 DOI: 10.1103/physrevlett.129.238101] [Citation(s) in RCA: 59] [Impact Index Per Article: 29.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/17/2022] [Accepted: 10/18/2022] [Indexed: 06/17/2023]
Abstract
The problem of predicting a protein's 3D structure from its primary amino acid sequence is a longstanding challenge in structural biology. Recently, approaches like alphafold have achieved remarkable performance on this task by combining deep learning techniques with coevolutionary data from multiple sequence alignments of related protein sequences. The use of coevolutionary information is critical to these models' accuracy, and without it their predictive performance drops considerably. In living cells, however, the 3D structure of a protein is fully determined by its primary sequence and the biophysical laws that cause it to fold into a low-energy configuration. Thus, it should be possible to predict a protein's structure from only its primary sequence by learning an approximate biophysical energy function. We provide evidence that alphafold has learned such an energy function, and uses coevolution data to solve the global search problem of finding a low-energy conformation. We demonstrate that alphafold'slearned energy function can be used to rank the quality of candidate protein structures with state-of-the-art accuracy, without using any coevolution data. Finally, we explore several applications of this energy function, including the prediction of protein structures without multiple sequence alignments.
Collapse
Affiliation(s)
- James P Roney
- Harvard University, Cambridge, Massachusetts 02138, USA
| | - Sergey Ovchinnikov
- John Harvard Distinguished Science Fellowship Program, Harvard University, Cambridge, Massachusetts 02138, USA
| |
Collapse
|
4
|
Magi Meconi G, Sasselli IR, Bianco V, Onuchic JN, Coluzza I. Key aspects of the past 30 years of protein design. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2022; 85:086601. [PMID: 35704983 DOI: 10.1088/1361-6633/ac78ef] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/03/2021] [Accepted: 06/15/2022] [Indexed: 06/15/2023]
Abstract
Proteins are the workhorse of life. They are the building infrastructure of living systems; they are the most efficient molecular machines known, and their enzymatic activity is still unmatched in versatility by any artificial system. Perhaps proteins' most remarkable feature is their modularity. The large amount of information required to specify each protein's function is analogically encoded with an alphabet of just ∼20 letters. The protein folding problem is how to encode all such information in a sequence of 20 letters. In this review, we go through the last 30 years of research to summarize the state of the art and highlight some applications related to fundamental problems of protein evolution.
Collapse
Affiliation(s)
- Giulia Magi Meconi
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | - Ivan R Sasselli
- Computational Biophysics Lab, Center for Cooperative Research in Biomaterials (CIC biomaGUNE), Basque Research and Technology Alliance (BRTA), Paseo de Miramon 182, 20014, Donostia-San Sebastián, Spain
| | | | - Jose N Onuchic
- Center for Theoretical Biological Physics, Department of Physics & Astronomy, Department of Chemistry, Department of Biosciences, Rice University, Houston, TX 77251, United States of America
| | - Ivan Coluzza
- BCMaterials, Basque Center for Materials, Applications and Nanostructures, Bld. Martina Casiano, UPV/EHU Science Park, Barrio Sarriena s/n, 48940 Leioa, Spain
- Basque Foundation for Science, Ikerbasque, 48009, Bilbao, Spain
| |
Collapse
|
5
|
Guaranteed Diversity and Optimality in Cost Function Network Based Computational Protein Design Methods. ALGORITHMS 2021. [DOI: 10.3390/a14060168] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
Abstract
Proteins are the main active molecules of life. Although natural proteins play many roles, as enzymes or antibodies for example, there is a need to go beyond the repertoire of natural proteins to produce engineered proteins that precisely meet application requirements, in terms of function, stability, activity or other protein capacities. Computational Protein Design aims at designing new proteins from first principles, using full-atom molecular models. However, the size and complexity of proteins require approximations to make them amenable to energetic optimization queries. These approximations make the design process less reliable, and a provable optimal solution may fail. In practice, expensive libraries of solutions are therefore generated and tested. In this paper, we explore the idea of generating libraries of provably diverse low-energy solutions by extending cost function network algorithms with dedicated automaton-based diversity constraints on a large set of realistic full protein redesign problems. We observe that it is possible to generate provably diverse libraries in reasonable time and that the produced libraries do enhance the Native Sequence Recovery, a traditional measure of design methods reliability.
Collapse
|
6
|
Stam MJ, Wood CW. DE-STRESS: a user-friendly web application for the evaluation of protein designs. Protein Eng Des Sel 2021; 34:gzab029. [PMID: 34908138 PMCID: PMC8672653 DOI: 10.1093/protein/gzab029] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2021] [Revised: 10/11/2021] [Accepted: 10/25/2021] [Indexed: 11/16/2022] Open
Abstract
De novo protein design is a rapidly growing field, and there are now many interesting and useful examples of designed proteins in the literature. However, most designs could be classed as failures when characterised in the lab, usually as a result of low expression, misfolding, aggregation or lack of function. This high attrition rate makes protein design unreliable and costly. It is possible that some of these failures could be caught earlier in the design process if it were quick and easy to generate information and a set of high-quality metrics regarding designs, which could be used to make reproducible and data-driven decisions about which designs to characterise experimentally. We present DE-STRESS (DEsigned STRucture Evaluation ServiceS), a web application for evaluating structural models of designed and engineered proteins. DE-STRESS has been designed to be simple, intuitive to use and responsive. It provides a wealth of information regarding designs, as well as tools to help contextualise the results and formally describe the properties that a design requires to be fit for purpose.
Collapse
Affiliation(s)
- Michael J Stam
- School of Informatics, University of Edinburgh, Edinburgh EH8 9AB, UK
| | - Christopher W Wood
- School of Biological Sciences, University of Edinburgh, Edinburgh EH9 3FF, UK
| |
Collapse
|
7
|
Banerjee S, Mitra D. Structural Basis of Design and Engineering for Advanced Plant Optogenetics. TRENDS IN PLANT SCIENCE 2020; 25:35-65. [PMID: 31699521 DOI: 10.1016/j.tplants.2019.10.002] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Revised: 09/12/2019] [Accepted: 10/03/2019] [Indexed: 06/10/2023]
Abstract
In optogenetics, light-sensitive proteins are specifically expressed in target cells and light is used to precisely control the activity of these proteins at high spatiotemporal resolution. Optogenetics initially used naturally occurring photoreceptors to control neural circuits, but has expanded to include carefully designed and engineered photoreceptors. Several optogenetic constructs are based on plant photoreceptors, but their application to plant systems has been limited. Here, we present perspectives on the development of plant optogenetics, considering different levels of design complexity. We discuss how general principles of light-driven signal transduction can be coupled with approaches for engineering protein folding to develop novel optogenetic tools. Finally, we explore how the use of computation, networks, circular permutation, and directed evolution could enrich optogenetics.
Collapse
Affiliation(s)
- Sudakshina Banerjee
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata 700073, India
| | - Devrani Mitra
- Department of Life Sciences, Presidency University, 86/1 College Street, Kolkata 700073, India.
| |
Collapse
|
8
|
Loshbaugh AL, Kortemme T. Comparison of Rosetta flexible-backbone computational protein design methods on binding interactions. Proteins 2019; 88:206-226. [PMID: 31344278 DOI: 10.1002/prot.25790] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2019] [Revised: 07/15/2019] [Accepted: 07/19/2019] [Indexed: 01/03/2023]
Abstract
Computational design of binding sites in proteins remains difficult, in part due to limitations in our current ability to sample backbone conformations that enable precise and accurate geometric positioning of side chains during sequence design. Here we present a benchmark framework for comparison between flexible-backbone design methods applied to binding interactions. We quantify the ability of different flexible backbone design methods in the widely used protein design software Rosetta to recapitulate observed protein sequence profiles assumed to represent functional protein/protein and protein/small molecule binding interactions. The CoupledMoves method, which combines backbone flexibility and sequence exploration into a single acceptance step during the sampling trajectory, better recapitulates observed sequence profiles than the BackrubEnsemble and FastDesign methods, which separate backbone flexibility and sequence design into separate acceptance steps during the sampling trajectory. Flexible-backbone design with the CoupledMoves method is a powerful strategy for reducing sequence space to generate targeted libraries for experimental screening and selection.
Collapse
Affiliation(s)
- Amanda L Loshbaugh
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California.,Biophysics Graduate Program, University of California San Francisco, San Francisco, California
| | - Tanja Kortemme
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California.,Biophysics Graduate Program, University of California San Francisco, San Francisco, California.,Quantitative Biosciences Institute, University of California San Francisco, San Francisco, California.,Chan Zuckerberg Biohub, San Francisco, California
| |
Collapse
|
9
|
Ludwiczak J, Jarmula A, Dunin-Horkawicz S. Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design. J Struct Biol 2018; 203:54-61. [PMID: 29454111 DOI: 10.1016/j.jsb.2018.02.004] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2017] [Revised: 01/25/2018] [Accepted: 02/13/2018] [Indexed: 01/15/2023]
Abstract
Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while molecular dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addition, we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for experimental validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, especially for tasks that require a large pool of diverse sequences.
Collapse
Affiliation(s)
- Jan Ludwiczak
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland; Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Pasteura 3, 02-093 Warsaw, Poland
| | - Adam Jarmula
- Laboratory of Bioinformatics, Nencki Institute of Experimental Biology, Pasteura 3, 02-093 Warsaw, Poland
| | - Stanislaw Dunin-Horkawicz
- Laboratory of Structural Bioinformatics, Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland.
| |
Collapse
|
10
|
Beyond Thermodynamic Constraints: Evolutionary Sampling Generates Realistic Protein Sequence Variation. Genetics 2018; 208:1387-1395. [PMID: 29382650 DOI: 10.1534/genetics.118.300699] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2018] [Accepted: 01/25/2018] [Indexed: 01/01/2023] Open
Abstract
Biological evolution generates a surprising amount of site-specific variability in protein sequences. Yet, attempts at modeling this process have been only moderately successful, and current models based on protein structural metrics explain, at best, 60% of the observed variation. Surprisingly, simple measures of protein structure, such as solvent accessibility, are often better predictors of site-specific variability than more complex models employing all-atom energy functions and detailed structural modeling. We suggest here that these more complex models perform poorly because they lack consideration of the evolutionary process, which is, in part, captured by the simpler metrics. We compare protein sequences that are computationally designed to sequences that are computationally evolved using the same protein-design energy function and to homologous natural sequences. We find that, by a wide variety of metrics, evolved sequences are much more similar to natural sequences than are designed sequences. In particular, designed sequences are too conserved on the protein surface relative to natural sequences, whereas evolved sequences are not. Our results suggest that evolutionary simulation produces a realistic sampling of sequence space. By contrast, protein design-at least as currently implemented-does not. Existing energy functions seem to be sufficiently accurate to correctly describe the key thermodynamic constraints acting on protein sequences, but they need to be paired with realistic sampling schemes to generate realistic sequence alignments.
Collapse
|
11
|
Hidden Markov model and Chapman Kolmogrov for protein structures prediction from images. Comput Biol Chem 2017; 68:231-244. [DOI: 10.1016/j.compbiolchem.2017.04.003] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Revised: 03/11/2017] [Accepted: 04/11/2017] [Indexed: 11/20/2022]
|
12
|
Echave J, Wilke CO. Biophysical Models of Protein Evolution: Understanding the Patterns of Evolutionary Sequence Divergence. Annu Rev Biophys 2017; 46:85-103. [PMID: 28301766 DOI: 10.1146/annurev-biophys-070816-033819] [Citation(s) in RCA: 75] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
Abstract
For decades, rates of protein evolution have been interpreted in terms of the vague concept of functional importance. Slowly evolving proteins or sites within proteins were assumed to be more functionally important and thus subject to stronger selection pressure. More recently, biophysical models of protein evolution, which combine evolutionary theory with protein biophysics, have completely revolutionized our view of the forces that shape sequence divergence. Slowly evolving proteins have been found to evolve slowly because of selection against toxic misfolding and misinteractions, linking their rate of evolution primarily to their abundance. Similarly, most slowly evolving sites in proteins are not directly involved in function, but mutating these sites has a large impact on protein structure and stability. In this article, we review the studies in the emerging field of biophysical protein evolution that have shaped our current understanding of sequence divergence patterns. We also propose future research directions to develop this nascent field.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina; .,Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Argentina
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Texas 78712;
| |
Collapse
|
13
|
Brender JR, Shultis D, Khattak NA, Zhang Y. An Evolution-Based Approach to De Novo Protein Design. Methods Mol Biol 2017; 1529:243-264. [PMID: 27914055 PMCID: PMC5667548 DOI: 10.1007/978-1-4939-6637-0_12] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023]
Abstract
EvoDesign is a computational algorithm that allows the rapid creation of new protein sequences that are compatible with specific protein structures. As such, it can be used to optimize protein stability, to resculpt the protein surface to eliminate undesired protein-protein interactions, and to optimize protein-protein binding. A major distinguishing feature of EvoDesign in comparison to other protein design programs is the use of evolutionary information in the design process to guide the sequence search toward native-like sequences known to adopt structurally similar folds as the target. The observed frequencies of amino acids in specific positions in the structure in the form of structural profiles collected from proteins with similar folds and complexes with similar interfaces can implicitly capture many subtle effects that are essential for correct folding and protein-binding interactions. As a result of the inclusion of evolutionary information, the sequences designed by EvoDesign have native-like folding and binding properties not seen by other physics-based design methods. In this chapter, we describe how EvoDesign can be used to redesign proteins with a focus on the computational and experimental procedures that can be used to validate the designs.
Collapse
|
14
|
Abstract
Protein-protein interactions play critical roles in essentially every cellular process. These interactions are often mediated by protein interaction domains that enable proteins to recognize their interaction partners, often by binding to short peptide motifs. For example, PDZ domains, which are among the most common protein interaction domains in the human proteome, recognize specific linear peptide sequences that are often at the C-terminus of other proteins. Determining the set of peptide sequences that a protein interaction domain binds, or it's "peptide specificity," is crucial for understanding its cellular function, and predicting how mutations impact peptide specificity is important for elucidating the mechanisms underlying human diseases. Moreover, engineering novel cellular functions for synthetic biology applications, such as the biosynthesis of biofuels or drugs, requires the design of protein interaction specificity to avoid crosstalk with native metabolic and signaling pathways. The ability to accurately predict and design protein-peptide interaction specificity is therefore critical for understanding and engineering biological function. One approach that has recently been employed toward accomplishing this goal is computational protein design. This chapter provides an overview of recent methodological advances in computational protein design and highlights examples of how these advances can enable increased accuracy in predicting and designing peptide specificity.
Collapse
Affiliation(s)
- Noah Ollikainen
- Division of Biology and Biological Engineering, California Institute of Technology, 1200 East California Blvd., Pasadena, CA, 91125, USA
| |
Collapse
|
15
|
Johansson KE, Tidemand Johansen N, Christensen S, Horowitz S, Bardwell JC, Olsen JG, Willemoës M, Lindorff-Larsen K, Ferkinghoff-Borg J, Hamelryck T, Winther JR. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J Mol Biol 2016; 428:4361-4377. [PMID: 27659562 PMCID: PMC5242314 DOI: 10.1016/j.jmb.2016.09.013] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2016] [Revised: 09/08/2016] [Accepted: 09/14/2016] [Indexed: 01/26/2023]
Abstract
Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 experimentally determined template structures and generated 120 designs from each. For experimental evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce soluble protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, soluble proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in soluble protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.
Collapse
Affiliation(s)
- Kristoffer E. Johansson
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Nicolai Tidemand Johansen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Signe Christensen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Scott Horowitz
- Howard Hughes Medical Institute, Department of Molecular, Cellular and Developmental Biology, University of Michigan, 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
| | - James C.A. Bardwell
- Howard Hughes Medical Institute, Department of Molecular, Cellular and Developmental Biology, University of Michigan, 109 Zina Pitcher Place, Ann Arbor, MI 48109, USA
| | - Johan G. Olsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Martin Willemoës
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Kresten Lindorff-Larsen
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Jesper Ferkinghoff-Borg
- Biotech Research and Innovation Centre, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Thomas Hamelryck
- Section for Computational and RNA Biology, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| | - Jakob R. Winther
- Linderstrøm-Lang Centre for Protein Science, Department of Biology, University of Copenhagen, Ole Maaløes Vej 5, Copenhagen DK-2200, Denmark
| |
Collapse
|
16
|
Jackson EL, Shahmoradi A, Spielman SJ, Jack BR, Wilke CO. Intermediate divergence levels maximize the strength of structure-sequence correlations in enzymes and viral proteins. Protein Sci 2016; 25:1341-53. [PMID: 26971720 PMCID: PMC4918415 DOI: 10.1002/pro.2920] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2015] [Accepted: 03/04/2016] [Indexed: 12/16/2022]
Abstract
Structural properties such as solvent accessibility and contact number predict site-specific sequence variability in many proteins. However, the strength and significance of these structure-sequence relationships vary widely among different proteins, with absolute correlation strengths ranging from 0 to 0.8. In particular, two recent works have made contradictory observations. Yeh et al. (Mol. Biol. Evol. 31:135-139, 2014) found that both relative solvent accessibility (RSA) and weighted contact number (WCN) are good predictors of sitewise evolutionary rate in enzymes, with WCN clearly out-performing RSA. Shahmoradi et al. (J. Mol. Evol. 79:130-142, 2014) considered these same predictors (as well as others) in viral proteins and found much weaker correlations and no clear advantage of WCN over RSA. Because these two studies had substantial methodological differences, however, a direct comparison of their results is not possible. Here, we reanalyze the datasets of the two studies with one uniform analysis pipeline, and we find that many apparent discrepancies between the two analyses can be attributed to the extent of sequence divergence in individual alignments. Specifically, the alignments of the enzyme dataset are much more diverged than those of the virus dataset, and proteins with higher divergence exhibit, on average, stronger structure-sequence correlations. However, the highest structure-sequence correlations are observed at intermediate divergence levels, where both highly conserved and highly variable sites are present in the same alignment.
Collapse
Affiliation(s)
- Eleisha L Jackson
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Amir Shahmoradi
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
- Department of Physics, The University of Texas at Austin, Austin, Texas, 78712
| | - Stephanie J Spielman
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Benjamin R Jack
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| | - Claus O Wilke
- Department of Integrative Biology, The University of Texas at Austin, Austin, Texas, 78712
- Center for Computational Biology and Bioinformatics, The University of Texas at Austin, Austin, Texas, 78712
- Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas, 78712
| |
Collapse
|
17
|
Pandini A, Morcos F, Khan S. The Gearbox of the Bacterial Flagellar Motor Switch. Structure 2016; 24:1209-20. [PMID: 27345932 PMCID: PMC4938800 DOI: 10.1016/j.str.2016.05.012] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/22/2016] [Revised: 04/26/2016] [Accepted: 05/23/2016] [Indexed: 12/11/2022]
Abstract
Switching of flagellar motor rotation sense dictates bacterial chemotaxis. Multi-subunit FliM-FliG rotor rings couple signal protein binding in FliM with reversal of a distant FliG C-terminal (FliGC) helix involved in stator contacts. Subunit dynamics were examined in conformer ensembles generated by molecular simulations from the X-ray structures. Principal component analysis extracted collective motions. Interfacial loop immobilization by complex formation coupled elastic fluctuations of the FliM middle (FliMM) and FliG middle (FliGM) domains. Coevolved mutations captured interfacial dynamics as well as contacts. FliGM rotation was amplified via two central hinges to the FliGC helix. Intrinsic flexibility, reported by the FliGMC ensembles, reconciled conformers with opposite FliGC helix orientations. FliG domain stacking deformed the inter-domain linker and reduced flexibility; but conformational changes were not triggered by engineered linker deletions that cause a rotation-locked phenotype. These facts suggest that binary rotation states arise from conformational selection by stacking interactions. Switch complex exploits differential subunit stiffness for mechanical amplification Distinct rotor protein X-ray structures generate overlapping conformer ensembles Stacking constraints on a flexible helix linker could select diverse rotation states Non-contact elastic couplings at the subunit interface in the complex have coevolved
Collapse
Affiliation(s)
- Alessandro Pandini
- Department of Computer Science and Synthetic Biology Theme, Brunel University London, Uxbridge UB8 3PH, UK; Computational Cell and Molecular Biology, The Francis Crick Institute, London NW1 1AT, UK
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX 75080, USA
| | - Shahid Khan
- Molecular Biology Consortium, Lawrence Berkeley National Laboratory, Berkeley, CA 94720, USA.
| |
Collapse
|
18
|
Using natural sequences and modularity to design common and novel protein topologies. Curr Opin Struct Biol 2016; 38:26-36. [PMID: 27270240 DOI: 10.1016/j.sbi.2016.05.007] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2016] [Revised: 05/13/2016] [Accepted: 05/18/2016] [Indexed: 02/07/2023]
Abstract
Protein design is still a challenging undertaking, often requiring multiple attempts or iterations for success. Typically, the source of failure is unclear, and scoring metrics appear similar between successful and failed cases. Nevertheless, the use of sequence statistics, modularity and symmetry from natural proteins, combined with computational design both at the coarse-grained and atomistic levels is propelling a new wave of design efforts to success. Here we highlight recent examples of design, showing how the wealth of natural protein sequence and topology data may be leveraged to reduce the search space and increase the likelihood of achieving desired outcomes.
Collapse
|
19
|
Maximova T, Moffatt R, Ma B, Nussinov R, Shehu A. Principles and Overview of Sampling Methods for Modeling Macromolecular Structure and Dynamics. PLoS Comput Biol 2016; 12:e1004619. [PMID: 27124275 PMCID: PMC4849799 DOI: 10.1371/journal.pcbi.1004619] [Citation(s) in RCA: 132] [Impact Index Per Article: 16.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Investigation of macromolecular structure and dynamics is fundamental to understanding how macromolecules carry out their functions in the cell. Significant advances have been made toward this end in silico, with a growing number of computational methods proposed yearly to study and simulate various aspects of macromolecular structure and dynamics. This review aims to provide an overview of recent advances, focusing primarily on methods proposed for exploring the structure space of macromolecules in isolation and in assemblies for the purpose of characterizing equilibrium structure and dynamics. In addition to surveying recent applications that showcase current capabilities of computational methods, this review highlights state-of-the-art algorithmic techniques proposed to overcome challenges posed in silico by the disparate spatial and time scales accessed by dynamic macromolecules. This review is not meant to be exhaustive, as such an endeavor is impossible, but rather aims to balance breadth and depth of strategies for modeling macromolecular structure and dynamics for a broad audience of novices and experts.
Collapse
Affiliation(s)
- Tatiana Maximova
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Ryan Moffatt
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
| | - Buyong Ma
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
| | - Ruth Nussinov
- Basic Science Program, Leidos Biomedical Research, Inc. Cancer and Inflammation Program, National Cancer Institute, Frederick, Maryland, United States of America
- Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv, Israel
| | - Amarda Shehu
- Department of Computer Science, George Mason University, Fairfax, Virginia, United States of America
- Department of Biongineering, George Mason University, Fairfax, Virginia, United States of America
- School of Systems Biology, George Mason University, Manassas, Virginia, United States of America
| |
Collapse
|
20
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
21
|
Kinjo AR. Liquid-theory analogy of direct-coupling analysis of multiple-sequence alignment and its implications for protein structure prediction. Biophys Physicobiol 2015; 12:117-9. [PMID: 27493860 PMCID: PMC4736835 DOI: 10.2142/biophysico.12.0_117] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2015] [Accepted: 11/10/2015] [Indexed: 12/01/2022] Open
Abstract
The direct-coupling analysis is a powerful method for protein contact prediction, and enables us to extract “direct” correlations between distant sites that are latent in “indirect” correlations observed in a protein multiple-sequence alignment. I show that the direct correlation can be obtained by using a formulation analogous to the Ornstein-Zernike integral equation in liquid theory. This formulation intuitively illustrates how the indirect or apparent correlation arises from an infinite series of direct correlations, and provides interesting insights into protein structure prediction.
Collapse
Affiliation(s)
- Akira R Kinjo
- Institute for Protein Research, Osaka University, Suita, Osaka 565-0871, Japan
| |
Collapse
|
22
|
Ollikainen N, de Jong RM, Kortemme T. Coupling Protein Side-Chain and Backbone Flexibility Improves the Re-design of Protein-Ligand Specificity. PLoS Comput Biol 2015; 11:e1004335. [PMID: 26397464 PMCID: PMC4580623 DOI: 10.1371/journal.pcbi.1004335] [Citation(s) in RCA: 67] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2015] [Accepted: 05/10/2015] [Indexed: 11/25/2022] Open
Abstract
Interactions between small molecules and proteins play critical roles in regulating and facilitating diverse biological functions, yet our ability to accurately re-engineer the specificity of these interactions using computational approaches has been limited. One main difficulty, in addition to inaccuracies in energy functions, is the exquisite sensitivity of protein–ligand interactions to subtle conformational changes, coupled with the computational problem of sampling the large conformational search space of degrees of freedom of ligands, amino acid side chains, and the protein backbone. Here, we describe two benchmarks for evaluating the accuracy of computational approaches for re-engineering protein-ligand interactions: (i) prediction of enzyme specificity altering mutations and (ii) prediction of sequence tolerance in ligand binding sites. After finding that current state-of-the-art “fixed backbone” design methods perform poorly on these tests, we develop a new “coupled moves” design method in the program Rosetta that couples changes to protein sequence with alterations in both protein side-chain and protein backbone conformations, and allows for changes in ligand rigid-body and torsion degrees of freedom. We show significantly increased accuracy in both predicting ligand specificity altering mutations and binding site sequences. These methodological improvements should be useful for many applications of protein – ligand design. The approach also provides insights into the role of subtle conformational adjustments that enable functional changes not only in engineering applications but also in natural protein evolution. Designing new protein–ligand interactions has tremendous potential for engineering sensitive biosensors for diagnostics or new enzymes useful in biotechnology, but these applications are extremely challenging, both because of inaccuracies of the energy functions used in modeling and design, and because protein active and binding sites are highly sensitive to subtle changes in structure. Here we describe a new method that addresses the second problem and couples changes in the structure of the protein backbone and of the amino acid side chains, the amino acid sequence, and the conformation of the ligand and its orientation in the binding site. We show that our method improvements significantly increase the accuracy of designing protein–ligand interactions compared to current state-of-the-art design methods. We assess these improvements in two important tests: the first predicts mutations that change ligand-binding preferences in enzymes, and the second predicts protein sequences that bind a given ligand. In these tests, subtle conformational changes made in our model are essential to recapitulate both the results from engineering experiments and the sequence diversity occurring in natural protein families. These results therefore shed light on the mechanisms of how new protein functions might have emerged and can be engineered in the laboratory.
Collapse
Affiliation(s)
- Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
| | - René M. de Jong
- DSM Biotechnology Center, Alexander Fleminglaan 1, Delft, The Netherlands
| | - Tanja Kortemme
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
- California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Science, University of California San Francisco, San Francisco, California, United States of America
- * E-mail:
| |
Collapse
|
23
|
Ó Conchúir S, Barlow KA, Pache RA, Ollikainen N, Kundert K, O'Meara MJ, Smith CA, Kortemme T. A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design. PLoS One 2015; 10:e0130433. [PMID: 26335248 PMCID: PMC4559433 DOI: 10.1371/journal.pone.0130433] [Citation(s) in RCA: 58] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/20/2015] [Indexed: 11/18/2022] Open
Abstract
The development and validation of computational macromolecular modeling and design methods depend on suitable benchmark datasets and informative metrics for comparing protocols. In addition, if a method is intended to be adopted broadly in diverse biological applications, there needs to be information on appropriate parameters for each protocol, as well as metrics describing the expected accuracy compared to experimental data. In certain disciplines, there exist established benchmarks and public resources where experts in a particular methodology are encouraged to supply their most efficient implementation of each particular benchmark. We aim to provide such a resource for protocols in macromolecular modeling and design. We present a freely accessible web resource (https://kortemmelab.ucsf.edu/benchmarks) to guide the development of protocols for protein modeling and design. The site provides benchmark datasets and metrics to compare the performance of a variety of modeling protocols using different computational sampling methods and energy functions, providing a "best practice" set of parameters for each method. Each benchmark has an associated downloadable benchmark capture archive containing the input files, analysis scripts, and tutorials for running the benchmark. The captures may be run with any suitable modeling method; we supply command lines for running the benchmarks using the Rosetta software suite. We have compiled initial benchmarks for the resource spanning three key areas: prediction of energetic effects of mutations, protein design, and protein structure prediction, each with associated state-of-the-art modeling protocols. With the help of the wider macromolecular modeling community, we hope to expand the variety of benchmarks included on the website and continue to evaluate new iterations of current methods as they become available.
Collapse
Affiliation(s)
- Shane Ó Conchúir
- California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| | - Kyle A. Barlow
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
| | - Roland A. Pache
- California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
| | - Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
| | - Kale Kundert
- Graduate Program in Biophysics, University of California San Francisco, San Francisco, California, United States of America
| | - Matthew J. O'Meara
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, United States of America
| | - Colin A. Smith
- California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
| | - Tanja Kortemme
- California Institute for Quantitative Biosciences (QB3), University of California San Francisco, San Francisco, California, United States of America
- Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, California, United States of America
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, California, United States of America
- Graduate Program in Biophysics, University of California San Francisco, San Francisco, California, United States of America
| |
Collapse
|
24
|
Stiebritz MT. MetREx: A protein design approach for the exploration of sequence-reactivity relationships in metalloenzymes. J Comput Chem 2015; 36:553-63. [DOI: 10.1002/jcc.23831] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2014] [Revised: 12/12/2014] [Accepted: 12/16/2014] [Indexed: 01/10/2023]
Affiliation(s)
- Martin T. Stiebritz
- Laboratorium für Physikalische Chemie, ETH Zürich; Vladimir-Prelog-Weg 2 CH-8093 Zürich Switzerland
| |
Collapse
|
25
|
Shahmoradi A, Sydykova DK, Spielman SJ, Jackson EL, Dawson ET, Meyer AG, Wilke CO. Predicting evolutionary site variability from structure in viral proteins: buriedness, packing, flexibility, and design. J Mol Evol 2014; 79:130-42. [PMID: 25217382 PMCID: PMC4216736 DOI: 10.1007/s00239-014-9644-x] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2014] [Accepted: 08/31/2014] [Indexed: 12/27/2022]
Abstract
Several recent works have shown that protein structure can predict site-specific evolutionary sequence variation. In particular, sites that are buried and/or have many contacts with other sites in a structure have been shown to evolve more slowly, on average, than surface sites with few contacts. Here, we present a comprehensive study of the extent to which numerous structural properties can predict sequence variation. The quantities we considered include buriedness (as measured by relative solvent accessibility), packing density (as measured by contact number), structural flexibility (as measured by B factors, root-mean-square fluctuations, and variation in dihedral angles), and variability in designed structures. We obtained structural flexibility measures both from molecular dynamics simulations performed on nine non-homologous viral protein structures and from variation in homologous variants of those proteins, where they were available. We obtained measures of variability in designed structures from flexible-backbone design in the Rosetta software. We found that most of the structural properties correlate with site variation in the majority of structures, though the correlations are generally weak (correlation coefficients of 0.1-0.4). Moreover, we found that buriedness and packing density were better predictors of evolutionary variation than structural flexibility. Finally, variability in designed structures was a weaker predictor of evolutionary variability than buriedness or packing density, but it was comparable in its predictive power to the best structural flexibility measures. We conclude that simple measures of buriedness and packing density are better predictors of evolutionary variation than the more complicated predictors obtained from dynamic simulations, ensembles of homologous structures, or computational protein design.
Collapse
Affiliation(s)
- Amir Shahmoradi
- Department of Physics, The University of Texas at Austin, Austin, TX 78712, USA
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Dariya K. Sydykova
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Stephanie J. Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Eleisha L. Jackson
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Eric T. Dawson
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Austin G. Meyer
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| | - Claus O. Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
26
|
Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A 2014; 111:12408-13. [PMID: 25114242 DOI: 10.1073/pnas.1413575111] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature T(sel) at which these foldable sequences have been selected in sequence space by evolution. T(sel) quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for T(sel) are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.
Collapse
|
27
|
Clark GW, Ackerman SH, Tillier ER, Gatti DL. Multidimensional mutual information methods for the analysis of covariation in multiple sequence alignments. BMC Bioinformatics 2014; 15:157. [PMID: 24886131 PMCID: PMC4046016 DOI: 10.1186/1471-2105-15-157] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2013] [Accepted: 05/06/2014] [Indexed: 11/10/2022] Open
Abstract
Background Several methods are available for the detection of covarying positions from a multiple sequence alignment (MSA). If the MSA contains a large number of sequences, information about the proximities between residues derived from covariation maps can be sufficient to predict a protein fold. However, in many cases the structure is already known, and information on the covarying positions can be valuable to understand the protein mechanism and dynamic properties. Results In this study we have sought to determine whether a multivariate (multidimensional) extension of traditional mutual information (MI) can be an additional tool to study covariation. The performance of two multidimensional MI (mdMI) methods, designed to remove the effect of ternary/quaternary interdependencies, was tested with a set of 9 MSAs each containing <400 sequences, and was shown to be comparable to that of the newest methods based on maximum entropy/pseudolikelyhood statistical models of protein sequences. However, while all the methods tested detected a similar number of covarying pairs among the residues separated by < 8 Å in the reference X-ray structures, there was on average less than 65% overlap between the top scoring pairs detected by methods that are based on different principles. Conclusions Given the large variety of structure and evolutionary history of different proteins it is possible that a single best method to detect covariation in all proteins does not exist, and that for each protein family the best information can be derived by merging/comparing results obtained with different methods. This approach may be particularly valuable in those cases in which the size of the MSA is small or the quality of the alignment is low, leading to significant differences in the pairs detected by different methods.
Collapse
Affiliation(s)
| | | | - Elisabeth R Tillier
- Department of Medical Biophysics, University of Toronto, Campbell Family Institute for Cancer Research, Ontario Cancer Institute, University Health Network, Toronto, Ontario, Canada.
| | | |
Collapse
|
28
|
Jackson EL, Ollikainen N, Covert AW, Kortemme T, Wilke CO. Amino-acid site variability among natural and designed proteins. PeerJ 2013; 1:e211. [PMID: 24255821 PMCID: PMC3828621 DOI: 10.7717/peerj.211] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2013] [Accepted: 10/24/2013] [Indexed: 11/20/2022] Open
Abstract
Computational protein design attempts to create protein sequences that fold stably into pre-specified structures. Here we compare alignments of designed proteins to alignments of natural proteins and assess how closely designed sequences recapitulate patterns of sequence variation found in natural protein sequences. We design proteins using RosettaDesign, and we evaluate both fixed-backbone designs and variable-backbone designs with different amounts of backbone flexibility. We find that proteins designed with a fixed backbone tend to underestimate the amount of site variability observed in natural proteins while proteins designed with an intermediate amount of backbone flexibility result in more realistic site variability. Further, the correlation between solvent exposure and site variability in designed proteins is lower than that in natural proteins. This finding suggests that site variability is too uniform across different solvent exposure states (i.e., buried residues are too variable or exposed residues too conserved). When comparing the amino acid frequencies in the designed proteins with those in natural proteins we find that in the designed proteins hydrophobic residues are underrepresented in the core. From these results we conclude that intermediate backbone flexibility during design results in more accurate protein design and that either scoring functions or backbone sampling methods require further improvement to accurately replicate structural constraints on site variability.
Collapse
Affiliation(s)
- Eleisha L. Jackson
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Noah Ollikainen
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
| | - Arthur W. Covert
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| | - Tanja Kortemme
- Graduate Program in Bioinformatics, University of California San Francisco, San Francisco, CA, USA
- California Institute for Quantitative Biosciences (QB3) and Department of Bioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Claus O. Wilke
- Institute of Cellular and Molecular Biology, Center for Computational Biology and Bioinformatics, and Department of Integrative Biology, The University of Texas at Austin, Austin, TX, USA
| |
Collapse
|