101
|
Ovchinnikov S, Park H, Varghese N, Huang PS, Pavlopoulos GA, Kim DE, Kamisetty H, Kyrpides NC, Baker D. Protein structure determination using metagenome sequence data. Science 2017; 355:294-298. [PMID: 28104891 PMCID: PMC5493203 DOI: 10.1126/science.aah4043] [Citation(s) in RCA: 336] [Impact Index Per Article: 48.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/22/2016] [Accepted: 11/22/2016] [Indexed: 01/30/2023]
Abstract
Despite decades of work by structural biologists, there are still ~5200 protein families with unknown structure outside the range of comparative modeling. We show that Rosetta structure prediction guided by residue-residue contacts inferred from evolutionary information can accurately model proteins that belong to large families and that metagenome sequence data more than triple the number of protein families with sufficient sequences for accurate modeling. We then integrate metagenome data, contact-based structure matching, and Rosetta structure calculations to generate models for 614 protein families with currently unknown structures; 206 are membrane proteins and 137 have folds not represented in the Protein Data Bank. This approach provides the representative models for large protein families originally envisioned as the goal of the Protein Structure Initiative at a fraction of the cost.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Molecular and Cellular Biology Program, University of Washington, Seattle, WA 98195, USA
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | | | - Po-Ssu Huang
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
| | | | - David E Kim
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98105, USA
| | | | - Nikos C Kyrpides
- Joint Genome Institute, Walnut Creek, CA 94598, USA
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, WA 98105, USA.
- Institute for Protein Design, University of Washington, Seattle, WA 98105, USA
- Howard Hughes Medical Institute, University of Washington, Box 357370, Seattle, WA 98105, USA
| |
Collapse
|
102
|
Bai F, Morcos F, Cheng RR, Jiang H, Onuchic JN. Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis. Proc Natl Acad Sci U S A 2016; 113:E8051-E8058. [PMID: 27911825 PMCID: PMC5167203 DOI: 10.1073/pnas.1615932113] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Protein-protein interactions play a central role in cellular function. Improving the understanding of complex formation has many practical applications, including the rational design of new therapeutic agents and the mechanisms governing signal transduction networks. The generally large, flat, and relatively featureless binding sites of protein complexes pose many challenges for drug design. Fragment docking and direct coupling analysis are used in an integrated computational method to estimate druggable protein-protein interfaces. (i) This method explores the binding of fragment-sized molecular probes on the protein surface using a molecular docking-based screen. (ii) The energetically favorable binding sites of the probes, called hot spots, are spatially clustered to map out candidate binding sites on the protein surface. (iii) A coevolution-based interface interaction score is used to discriminate between different candidate binding sites, yielding potential interfacial targets for therapeutic drug design. This approach is validated for important, well-studied disease-related proteins with known pharmaceutical targets, and also identifies targets that have yet to be studied. Moreover, therapeutic agents are proposed by chemically connecting the fragments that are strongly bound to the hot spots.
Collapse
Affiliation(s)
- Fang Bai
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Dallas, TX 75080
- Department of Bioengineering, University of Texas at Dallas, Dallas, TX 75080
- Center for Systems Biology, University of Texas at Dallas, Dallas, TX 75080
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China;
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005;
- Department of Physics and Astronomy, Rice University, Houston, TX 77005
- Department of Chemistry, Rice University, Houston, TX 77005
- Department of Biosciences, Rice University, Houston, TX 77005
| |
Collapse
|
103
|
Levy RM, Haldane A, Flynn WF. Potts Hamiltonian models of protein co-variation, free energy landscapes, and evolutionary fitness. Curr Opin Struct Biol 2016; 43:55-62. [PMID: 27870991 DOI: 10.1016/j.sbi.2016.11.004] [Citation(s) in RCA: 56] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 11/03/2016] [Indexed: 11/17/2022]
Abstract
Potts Hamiltonian models of protein sequence co-variation are statistical models constructed from the pair correlations observed in a multiple sequence alignment (MSA) of a protein family. These models are powerful because they capture higher order correlations induced by mutations evolving under constraints and help quantify the connections between protein sequence, structure, and function maintained through evolution. We review recent work with Potts models to predict protein structure and sequence-dependent conformational free energy landscapes, to survey protein fitness landscapes and to explore the effects of epistasis on fitness. We also comment on the numerical methods used to infer these models for each application.
Collapse
Affiliation(s)
- Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States.
| | - Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, PA 19122, United States; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, NJ 08854, United States
| |
Collapse
|
104
|
Yu J, Andreani J, Ochsenbein F, Guerois R. Lessons from (co-)evolution in the docking of proteins and peptides for CAPRI Rounds 28-35. Proteins 2016; 85:378-390. [PMID: 27701780 DOI: 10.1002/prot.25180] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2016] [Revised: 08/25/2016] [Accepted: 08/25/2016] [Indexed: 11/06/2022]
Abstract
Computational protein-protein docking is of great importance for understanding protein interactions at the structural level. Critical assessment of prediction of interactions (CAPRI) experiments provide the protein docking community with a unique opportunity to blindly test methods based on real-life cases and help accelerate methodology development. For CAPRI Rounds 28-35, we used an automatic docking pipeline integrating the coarse-grained co-evolution-based potential InterEvScore. This score was developed to exploit the information contained in the multiple sequence alignments of binding partners and selectively recognize co-evolved interfaces. Together with Zdock/Frodock for rigid-body docking, SOAP-PP for atomic potential and Rosetta applications for structural refinement, this pipeline reached high performance on a majority of targets. For protein-peptide docking and interfacial water position predictions, we also explored different means of taking evolutionary information into account. Overall, our group ranked 1st by correctly predicting 10 targets, composed of 1 High, 7 Medium and 2 Acceptable predictions. Excellent and Outstanding levels of accuracy were reached for each of the two water prediction targets, respectively. Altogether, in 15 out of 18 targets in total, evolutionary information, either through co-evolution or conservation analyses, could provide key constraints to guide modeling towards the most likely assemblies. These results open promising perspectives regarding the way evolutionary information can be valuable to improve docking prediction accuracy. Proteins 2017; 85:378-390. © 2016 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
- Jinchao Yu
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette cedex, F-91198, France
| | - Jessica Andreani
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette cedex, F-91198, France
| | - Françoise Ochsenbein
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette cedex, F-91198, France
| | - Raphaël Guerois
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Univ Paris-Sud, Université Paris-Saclay, Gif-sur-Yvette cedex, F-91198, France
| |
Collapse
|
105
|
Abstract
Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.
Collapse
|
106
|
Cheng RR, Nordesjö O, Hayes RL, Levine H, Flores SC, Onuchic JN, Morcos F. Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes. Mol Biol Evol 2016; 33:3054-3064. [PMID: 27604223 PMCID: PMC5100047 DOI: 10.1093/molbev/msw188] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 204 mutational variants of the PhoQ kinase in Escherichia coli We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Collapse
Affiliation(s)
- R R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - O Nordesjö
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - R L Hayes
- Department of Biophysics, University of Michigan, Ann Arbor, MI
| | - H Levine
- Center for Theoretical Biological Physics, Rice University, Houston, TX.,Department of Bioengineering, Rice University, Houston, TX
| | - S C Flores
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - J N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX .,Department of Physics and Astronomy, Rice University, Houston, TX.,Department of Chemistry, and Biosciences, Rice University, Houston, TX
| | - F Morcos
- Department of Biological Sciences and Center for Systems Biology, University of Texas at Dallas, Dallas, TX
| |
Collapse
|
107
|
Monastyrskyy B, D'Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 2016; 84 Suppl 1:131-44. [PMID: 26474083 PMCID: PMC4834069 DOI: 10.1002/prot.24943] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2015] [Revised: 09/15/2015] [Accepted: 10/11/2015] [Indexed: 12/27/2022]
Abstract
This article provides a report on the state-of-the-art in the prediction of intra-molecular residue-residue contacts in proteins based on the assessment of the predictions submitted to the CASP11 experiment. The assessment emphasis is placed on the accuracy in predicting long-range contacts. Twenty-nine groups participated in contact prediction in CASP11. At least eight of them used the recently developed evolutionary coupling techniques, with the top group (CONSIP2) reaching precision of 27% on target proteins that could not be modeled by homology. This result indicates a breakthrough in the development of methods based on the correlated mutation approach. Successful prediction of contacts was shown to be practically helpful in modeling three-dimensional structures; in particular target T0806 was modeled exceedingly well with accuracy not yet seen for ab initio targets of this size (>250 residues). Proteins 2016; 84(Suppl 1):131-144. © 2015 Wiley Periodicals, Inc.
Collapse
Affiliation(s)
| | - Daniel D'Andrea
- Department of Physics, Sapienza-University of Rome, Rome, 00185, Italy
| | | | - Anna Tramontano
- Department of Physics, Sapienza-University of Rome, Rome, 00185, Italy
- Istituto Pasteur-Fondazione Cenci Bolognetti-University of Rome, Rome, 00185, Italy
| | | |
Collapse
|
108
|
Abstract
Structural domains are believed to be modules within proteins that can fold and function independently. Some proteins show tandem repetitions of apparent modular structure that do not fold independently, but rather co-operate in stabilizing structural forms that comprise several repeat-units. For many natural repeat-proteins, it has been shown that weak energetic links between repeats lead to the breakdown of co-operativity and the appearance of folding sub-domains within an apparently regular repeat array. The quasi-1D architecture of repeat-proteins is crucial in detailing how the local energetic balances can modulate the folding dynamics of these proteins, which can be related to the physiological behaviour of these ubiquitous biological systems.
Collapse
|
109
|
Zschiedrich CP, Keidel V, Szurmant H. Molecular Mechanisms of Two-Component Signal Transduction. J Mol Biol 2016; 428:3752-75. [PMID: 27519796 DOI: 10.1016/j.jmb.2016.08.003] [Citation(s) in RCA: 356] [Impact Index Per Article: 44.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2016] [Revised: 07/30/2016] [Accepted: 08/01/2016] [Indexed: 02/03/2023]
Abstract
Two-component systems (TCS) comprising sensor histidine kinases and response regulator proteins are among the most important players in bacterial and archaeal signal transduction and also occur in reduced numbers in some eukaryotic organisms. Given their importance to cellular survival, virulence, and cellular development, these systems are among the most scrutinized bacterial proteins. In the recent years, a flurry of bioinformatics, genetic, biochemical, and structural studies have provided detailed insights into many molecular mechanisms that underlie the detection of signals and the generation of the appropriate response by TCS. Importantly, it has become clear that there is significant diversity in the mechanisms employed by individual systems. This review discusses the current knowledge on common themes and divergences from the paradigm of TCS signaling. An emphasis is on the information gained by a flurry of recent structural and bioinformatics studies.
Collapse
Affiliation(s)
- Christopher P Zschiedrich
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 309 E Second Street, Pomona, CA 91766, USA; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA
| | - Victoria Keidel
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 309 E Second Street, Pomona, CA 91766, USA; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA
| | - Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, 309 E Second Street, Pomona, CA 91766, USA; Department of Molecular and Experimental Medicine, The Scripps Research Institute, 10550 N Torrey Pines Road, La Jolla, CA 92037, USA.
| |
Collapse
|
110
|
Lloyd Evans D, Joshi SV. Elucidating modes of activation and herbicide resistance by sequence assembly and molecular modelling of the Acetolactate synthase complex in sugarcane. J Theor Biol 2016; 407:184-197. [PMID: 27452529 DOI: 10.1016/j.jtbi.2016.07.025] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2016] [Revised: 06/14/2016] [Accepted: 07/20/2016] [Indexed: 10/21/2022]
Abstract
Acetolactate synthase (ALS) catalyzes the first portion of the biosynthetic pathway leading to the generation of branched-chain amino acids. As such it is essential for plant health and is a major target for herbicides. ALS is a very poorly characterized molecule in sugarcane. The enzyme is activated and inhibited by a regulatory subunit (known as VAT1 in plants) whose mode of action is entirely unknown. Using Saccharum halepense as a template we have assembled the ALS gene of sugarcane (Saccharum hybrid) and have modelled the structure of ALS based on an Arabidopsis template (the first ALS model for a monocot). We have also assembled the ALS regulatory proteins (VAT1 and VAT2) from sugarcane and show that VAT2 is specific to true grasses. Employing a bacterial model, we have generated a structural model for VAT1, which explains why the separate domains of the proteins bind to either leucine or valine but not both. Using co-evolution studies we have determined molecular contacts by which we modelled the docking of VAT1 to ALS. In conclusion, we demonstrate how the binding of VAT1 to ALS activates ALS and show how VAT1 can also confer feedback inhibition to ALS. We validate our ALS model against biochemical data and employ this model to explain the function of a novel herbicide binding mutant in sugarcane.
Collapse
Affiliation(s)
- Dyfed Lloyd Evans
- South African Sugarcane Research Institute, 170 Flanders Drive, Private Bag X02, Mount Edgecombe, Durban 4300, South Africa; School of Life Sciences, College of Agriculture, Engineering and Science, University of Kwa-Zulu Natal, Private Bag X54001, Durban 4000, South Africa.
| | - Shailesh Vinay Joshi
- South African Sugarcane Research Institute, 170 Flanders Drive, Private Bag X02, Mount Edgecombe, Durban 4300, South Africa; School of Life Sciences, College of Agriculture, Engineering and Science, University of Kwa-Zulu Natal, Private Bag X54001, Durban 4000, South Africa
| |
Collapse
|
111
|
Haldane A, Flynn WF, He P, Vijayan RSK, Levy RM. Structural propensities of kinase family proteins from a Potts model of residue co-variation. Protein Sci 2016; 25:1378-84. [PMID: 27241634 DOI: 10.1002/pro.2954] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Revised: 05/25/2016] [Accepted: 05/26/2016] [Indexed: 12/23/2022]
Abstract
Understanding the conformational propensities of proteins is key to solving many problems in structural biology and biophysics. The co-variation of pairs of mutations contained in multiple sequence alignments of protein families can be used to build a Potts Hamiltonian model of the sequence patterns which accurately predicts structural contacts. This observation paves the way to develop deeper connections between evolutionary fitness landscapes of entire protein families and the corresponding free energy landscapes which determine the conformational propensities of individual proteins. Using statistical energies determined from the Potts model and an alignment of 2896 PDB structures, we predict the propensity for particular kinase family proteins to assume a "DFG-out" conformation implicated in the susceptibility of some kinases to type-II inhibitors, and validate the predictions by comparison with the observed structural propensities of the corresponding proteins and experimental binding affinity data. We decompose the statistical energies to investigate which interactions contribute the most to the conformational preference for particular sequences and the corresponding proteins. We find that interactions involving the activation loop and the C-helix and HRD motif are primarily responsible for stabilizing the DFG-in state. This work illustrates how structural free energy landscapes and fitness landscapes of proteins can be used in an integrated way, and in the context of kinase family proteins, can potentially impact therapeutic design strategies.
Collapse
Affiliation(s)
- Allan Haldane
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, 19122
| | - William F Flynn
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, 19122.,Department of Physics and Astronomy, Rutgers, the State University of New Jersey, Piscataway, New Jersey, 08854
| | - Peng He
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, 19122
| | - R S K Vijayan
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, 19122
| | - Ronald M Levy
- Department of Chemistry, Center for Biophysics and Computational Biology, Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, 19122
| |
Collapse
|
112
|
Barton JP, De Leonardis E, Coucke A, Cocco S. ACE: adaptive cluster expansion for maximum entropy graphical model inference. Bioinformatics 2016; 32:3089-3097. [PMID: 27329863 DOI: 10.1093/bioinformatics/btw328] [Citation(s) in RCA: 63] [Impact Index Per Article: 7.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2016] [Accepted: 05/18/2016] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Graphical models are often employed to interpret patterns of correlations observed in data through a network of interactions between the variables. Recently, Ising/Potts models, also known as Markov random fields, have been productively applied to diverse problems in biology, including the prediction of structural contacts from protein sequence data and the description of neural activity patterns. However, inference of such models is a challenging computational problem that cannot be solved exactly. Here, we describe the adaptive cluster expansion (ACE) method to quickly and accurately infer Ising or Potts models based on correlation data. ACE avoids overfitting by constructing a sparse network of interactions sufficient to reproduce the observed correlation data within the statistical error expected due to finite sampling. When convergence of the ACE algorithm is slow, we combine it with a Boltzmann Machine Learning algorithm (BML). We illustrate this method on a variety of biological and artificial datasets and compare it to state-of-the-art approximate methods such as Gaussian and pseudo-likelihood inference. RESULTS We show that ACE accurately reproduces the true parameters of the underlying model when they are known, and yields accurate statistical descriptions of both biological and artificial data. Models inferred by ACE more accurately describe the statistics of the data, including both the constrained low-order correlations and unconstrained higher-order correlations, compared to those obtained by faster Gaussian and pseudo-likelihood methods. These alternative approaches can recover the structure of the interaction network but typically not the correct strength of interactions, resulting in less accurate generative models. AVAILABILITY AND IMPLEMENTATION The ACE source code, user manual and tutorials with the example data and filtered correlations described herein are freely available on GitHub at https://github.com/johnbarton/ACE CONTACTS: jpbarton@mit.edu, cocco@lps.ens.frSupplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- J P Barton
- Departments of Chemical Engineering and Physics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA Ragon Institute of Massachusetts General Hospital, Massachusetts Institute of Technology and Harvard, Cambridge, MA 02139, USA
| | - E De Leonardis
- Laboratoire de Physique Statistique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Université, Paris, France
| | - A Coucke
- Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Université, Paris, France Laboratoire de Physique Théorique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France
| | - S Cocco
- Laboratoire de Physique Statistique de L'Ecole Normale Supérieure, CNRS, Ecole Normale Supérieure & Université P.&M. Curie, Paris, France
| |
Collapse
|
113
|
van Nimwegen E. Inferring Contacting Residues within and between Proteins: What Do the Probabilities Mean? PLoS Comput Biol 2016; 12:e1004726. [PMID: 27171220 PMCID: PMC4865087 DOI: 10.1371/journal.pcbi.1004726] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
|
114
|
Castellana M, Bialek W, Cavagna A, Giardina I. Entropic effects in a nonequilibrium system: Flocks of birds. Phys Rev E 2016; 93:052416. [PMID: 27300933 DOI: 10.1103/physreve.93.052416] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Indexed: 06/06/2023]
Abstract
When European starlings come together to form a flock, the distribution of their individual velocities narrows around the mean velocity of the flock. We argue that, in a broad class of models for the joint distribution of positions and velocities, this narrowing generates an entropic effect that opposes the cohesion of the flock. The strength of this effect depends strongly on the nature of the interactions among birds: If birds are coupled to a fixed number of neighbors, the entropic forces are weak, while if they couple to all other birds within a fixed distance, the entropic effects are sufficient to tear a flock apart.
Collapse
Affiliation(s)
- Michele Castellana
- Joseph Henry Laboratories of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Laboratoire Physico Chimie Curie, Institut Curie, PSL Research University, CNRS UMR 168, 75005 Paris, France
- Sorbonne Universités, UPMC Paris 06, 75005 Paris, France
| | - William Bialek
- Joseph Henry Laboratories of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
- Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Avenue, New York, New York 10016, USA
| | - Andrea Cavagna
- Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Avenue, New York, New York 10016, USA
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Rome, Italy and Dipartimento di Fisica, Università Sapienza, Rome, Italy
| | - Irene Giardina
- Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Avenue, New York, New York 10016, USA
- Istituto dei Sistemi Complessi, Consiglio Nazionale delle Ricerche, Rome, Italy and Dipartimento di Fisica, Università Sapienza, Rome, Italy
| |
Collapse
|
115
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 158] [Impact Index Per Article: 19.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
116
|
Intramolecular allosteric communication in dopamine D2 receptor revealed by evolutionary amino acid covariation. Proc Natl Acad Sci U S A 2016; 113:3539-44. [PMID: 26979958 DOI: 10.1073/pnas.1516579113] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
The structural basis of allosteric signaling in G protein-coupled receptors (GPCRs) is important in guiding design of therapeutics and understanding phenotypic consequences of genetic variation. The Evolutionary Trace (ET) algorithm previously proved effective in redesigning receptors to mimic the ligand specificities of functionally distinct homologs. We now expand ET to consider mutual information, with validation in GPCR structure and dopamine D2 receptor (D2R) function. The new algorithm, called ET-MIp, identifies evolutionarily relevant patterns of amino acid covariations. The improved predictions of structural proximity and D2R mutagenesis demonstrate that ET-MIp predicts functional interactions between residue pairs, particularly potency and efficacy of activation by dopamine. Remarkably, although most of the residue pairs chosen for mutagenesis are neither in the binding pocket nor in contact with each other, many exhibited functional interactions, implying at-a-distance coupling. The functional interaction between the coupled pairs correlated best with the evolutionary coupling potential derived from dopamine receptor sequences rather than with broader sets of GPCR sequences. These data suggest that the allosteric communication responsible for dopamine responses is resolved by ET-MIp and best discerned within a short evolutionary distance. Most double mutants restored dopamine response to wild-type levels, also suggesting that tight regulation of the response to dopamine drove the coevolution and intramolecular communications between coupled residues. Our approach provides a general tool to identify evolutionary covariation patterns in small sets of close sequence homologs and to translate them into functional linkages between residues.
Collapse
|
117
|
Abstract
Allosteric transition, defined as conformational changes induced by ligand binding, is one of the fundamental properties of proteins. Allostery has been observed and characterized in many proteins, and has been recently utilized to control protein function via regulation of protein activity. Here, we review the physical and evolutionary origin of protein allostery, as well as its importance to protein regulation, drug discovery, and biological processes in living systems. We describe recently developed approaches to identify allosteric pathways, connected sets of pairwise interactions that are responsible for propagation of conformational change from the ligand-binding site to a distal functional site. We then present experimental and computational protein engineering approaches for control of protein function by modulation of allosteric sites. As an example of application of these approaches, we describe a synergistic computational and experimental approach to rescue the cystic-fibrosis-associated protein cystic fibrosis transmembrane conductance regulator, which upon deletion of a single residue misfolds and causes disease. This example demonstrates the power of allosteric manipulation in proteins to both elucidate mechanisms of molecular function and to develop therapeutic strategies that rescue those functions. Allosteric control of proteins provides a tool to shine a light on the complex cascades of cellular processes and facilitate unprecedented interrogation of biological systems.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
118
|
Noel JK, Morcos F, Onuchic JN. Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics. F1000Res 2016; 5. [PMID: 26918164 PMCID: PMC4755392 DOI: 10.12688/f1000research.7186.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/21/2016] [Indexed: 11/25/2022] Open
Abstract
Experimentally derived structural constraints have been crucial to the implementation of computational models of biomolecular dynamics. For example, not only does crystallography provide essential starting points for molecular simulations but also high-resolution structures permit for parameterization of simplified models. Since the energy landscapes for proteins and other biomolecules have been shown to be minimally frustrated and therefore funneled, these structure-based models have played a major role in understanding the mechanisms governing folding and many functions of these systems. Structural information, however, may be limited in many interesting cases. Recently, the statistical analysis of residue co-evolution in families of protein sequences has provided a complementary method of discovering residue-residue contact interactions involved in functional configurations. These functional configurations are often transient and difficult to capture experimentally. Thus, co-evolutionary information can be merged with that available for experimentally characterized low free-energy structures, in order to more fully capture the true underlying biomolecular energy landscape.
Collapse
Affiliation(s)
- Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA; Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jose N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| |
Collapse
|
119
|
Sahoo A, Khare S, Devanarayanan S, Jain PC, Varadarajan R. Residue proximity information and protein model discrimination using saturation-suppressor mutagenesis. eLife 2015; 4. [PMID: 26716404 PMCID: PMC4758949 DOI: 10.7554/elife.09532] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2015] [Accepted: 12/29/2015] [Indexed: 12/16/2022] Open
Abstract
Identification of residue-residue contacts from primary sequence can be used to guide protein structure prediction. Using Escherichia coli CcdB as the test case, we describe an experimental method termed saturation-suppressor mutagenesis to acquire residue contact information. In this methodology, for each of five inactive CcdB mutants, exhaustive screens for suppressors were performed. Proximal suppressors were accurately discriminated from distal suppressors based on their phenotypes when present as single mutants. Experimentally identified putative proximal pairs formed spatial constraints to recover >98% of native-like models of CcdB from a decoy dataset. Suppressor methodology was also applied to the integral membrane protein, diacylglycerol kinase A where the structures determined by X-ray crystallography and NMR were significantly different. Suppressor as well as sequence co-variation data clearly point to the X-ray structure being the functional one adopted in vivo. The methodology is applicable to any macromolecular system for which a convenient phenotypic assay exists. DOI:http://dx.doi.org/10.7554/eLife.09532.001 Common techniques to determine the three-dimensional structures of proteins can help researchers to understand these molecules’ activities, but are often time-consuming and do not work for all proteins. Proteins are made of chains of amino acids. When a protein chain folds, some of these amino acids interact with other amino acids and these contacts dictate the overall shape of the protein. This means that identifying the pairs of contacting amino acids could make it possible to predict the protein’s structure. Interactions between pairs of contacting amino acids tend to remain conserved throughout evolution, and if a mutation alters one of the amino acids in a pair then a 'compensatory' change often occurs to alter the second amino acid as well. Compensatory mutations can suggest that two amino acids are close to each other in the three-dimensional shape of a protein, but the computational methods used to identify such amino acid pairs can sometimes be inaccurate. In 2012, researchers generated mutants of a bacterial protein called CcdB with changes to single amino acids that caused the protein to fail to fold correctly. Now, Sahoo et al. – who include two of the researchers involved in the 2012 work – have developed an experimental method to identify contacting amino acids and use the CcdB protein as a test case. The approach involved searching for additional mutations that could restore the activity of five of the original mutant proteins when the proteins were produced in yeast cells. The rationale was that any secondary mutations that restored the activity must have corrected the folding defect caused by the original mutation. Sahoo et al. then predicted how close the amino acids affected by the secondary mutations were to the amino acids altered by the original mutations. This information was used to select reliable three-dimensional models of CcdB from a large set of possible structures that had been generated previously using computer models. Next, the technique was applied to a protein called diacylglycerol kinase A. The structure of this protein had previously been inferred using techniques such as X-ray crystallography and nuclear magnetic resonance, but there was a mismatch between the two methods. Sahoo et al. found that the amino acid contacts derived from their experimental method matched those found in the crystal structure, suggesting that the functional protein structure in living cells is similar to the crystal structure. In the future, the experimental approach developed in this work could be combined with existing methods to reliably guide protein structure prediction. DOI:http://dx.doi.org/10.7554/eLife.09532.002
Collapse
Affiliation(s)
- Anusmita Sahoo
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Shruti Khare
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | | | - Pankaj C Jain
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India
| | - Raghavan Varadarajan
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore, India.,Jawaharlal Nehru Center for Advanced Scientific Research, Bangalore, India
| |
Collapse
|
120
|
Sethi A, Clarke D, Chen J, Kumar S, Galeev TR, Regan L, Gerstein M. Reads meet rotamers: structural biology in the age of deep sequencing. Curr Opin Struct Biol 2015; 35:125-34. [PMID: 26658741 DOI: 10.1016/j.sbi.2015.11.003] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/21/2015] [Revised: 11/04/2015] [Accepted: 11/05/2015] [Indexed: 01/07/2023]
Abstract
Structure has traditionally been interrelated with sequence, usually in the framework of comparing sequences across species sharing a common fold. However, the nature of information within the sequence and structure databases is evolving, changing the type of comparisons possible. In particular, we now have a vast amount of personal genome sequences from human populations and a greater fraction of new structures contain interacting proteins within large complexes. Consequently, we have to recast our conception of sequence conservation and its relation to structure-for example, focusing more on selection within the human population. Moreover, within structural biology there is less emphasis on the discovery of novel folds and more on relating structures to networks of protein interactions. We cover this changing mindset here.
Collapse
Affiliation(s)
- Anurag Sethi
- Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Declan Clarke
- Department of Chemistry, Yale University, New Haven, CT, United States
| | - Jieming Chen
- Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States
| | - Sushant Kumar
- Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Timur R Galeev
- Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States
| | - Lynne Regan
- Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States; Department of Chemistry, Yale University, New Haven, CT, United States
| | - Mark Gerstein
- Molecular Biophysics and Biochemistry, Yale University, New Haven, CT, United States; Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT, United States.
| |
Collapse
|
121
|
Corrales M, Cuscó P, Usmanova DR, Chen HC, Bogatyreva NS, Filion GJ, Ivankov DN. Machine Learning: How Much Does It Tell about Protein Folding Rates? PLoS One 2015; 10:e0143166. [PMID: 26606303 PMCID: PMC4659572 DOI: 10.1371/journal.pone.0143166] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2015] [Accepted: 11/02/2015] [Indexed: 11/18/2022] Open
Abstract
The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition.
Collapse
Affiliation(s)
- Marc Corrales
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Pol Cuscó
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Dinara R. Usmanova
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Moscow Institute of Physics and Technology, Dolgoprudny, Moscow Region, Russia
| | - Heng-Chang Chen
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Natalya S. Bogatyreva
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow Region, Russia
| | - Guillaume J. Filion
- Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Spain Genome Architecture, Gene Regulation, Stem Cells and Cancer Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
| | - Dmitry N. Ivankov
- Universitat Pompeu Fabra (UPF), Barcelona, Spain
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), Barcelona, Spain
- Laboratory of Protein Physics, Institute of Protein Research of the Russian Academy of Sciences, Pushchino, Moscow Region, Russia
- * E-mail:
| |
Collapse
|
122
|
Shameer K, Tripathi LP, Kalari KR, Dudley JT, Sowdhamini R. Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment. Brief Bioinform 2015; 17:841-62. [PMID: 26494363 DOI: 10.1093/bib/bbv084] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/14/2015] [Indexed: 12/20/2022] Open
Abstract
Accurate assessment of genetic variation in human DNA sequencing studies remains a nontrivial challenge in clinical genomics and genome informatics. Ascribing functional roles and/or clinical significances to single nucleotide variants identified from a next-generation sequencing study is an important step in genome interpretation. Experimental characterization of all the observed functional variants is yet impractical; thus, the prediction of functional and/or regulatory impacts of the various mutations using in silico approaches is an important step toward the identification of functionally significant or clinically actionable variants. The relationships between genotypes and the expressed phenotypes are multilayered and biologically complex; such relationships present numerous challenges and at the same time offer various opportunities for the design of in silico variant assessment strategies. Over the past decade, many bioinformatics algorithms have been developed to predict functional consequences of single nucleotide variants in the protein coding regions. In this review, we provide an overview of the bioinformatics resources for the prediction, annotation and visualization of coding single nucleotide variants. We discuss the currently available approaches and major challenges from the perspective of protein sequence, structure, function and interactions that require consideration when interpreting the impact of putatively functional variants. We also discuss the relevance of incorporating integrated workflows for predicting the biomedical impact of the functionally important variations encoded in a genome, exome or transcriptome. Finally, we propose a framework to classify variant assessment approaches and strategies for incorporation of variant assessment within electronic health records.
Collapse
|
123
|
From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci U S A 2015; 112:13567-72. [PMID: 26487681 DOI: 10.1073/pnas.1508584112] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The analysis of evolutionary amino acid correlations has recently attracted a surge of renewed interest, also due to their successful use in de novo protein native structure prediction. However, many aspects of protein function, such as substrate binding and product release in enzymatic activity, can be fully understood only in terms of an equilibrium ensemble of alternative structures, rather than a single static structure. In this paper we combine coevolutionary data and molecular dynamics simulations to study protein conformational heterogeneity. To that end, we adapt the Boltzmann-learning algorithm to the analysis of homologous protein sequences and develop a coarse-grained protein model specifically tailored to convert the resulting contact predictions to a protein structural ensemble. By means of exhaustive sampling simulations, we analyze the set of conformations that are consistent with the observed residue correlations for a set of representative protein domains, showing that (i) the most representative structure is consistent with the experimental fold and (ii) the various regions of the sequence display different stability, related to multiple biologically relevant conformations and to the cooperativity of the coevolving pairs. Moreover, we show that the proposed protocol is able to reproduce the essential features of a protein folding mechanism as well as to account for regions involved in conformational transitions through the correct sampling of the involved conformers.
Collapse
|
124
|
Figliuzzi M, Jacquier H, Schug A, Tenaillon O, Weigt M. Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1. Mol Biol Evol 2015; 33:268-80. [PMID: 26446903 PMCID: PMC4693977 DOI: 10.1093/molbev/msv211] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, for example, of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large alignments of homologs of the protein of interest. Our method is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear. Compared with recent large-scale mutagenesis data of the beta-lactamase TEM-1, a protein providing resistance against beta-lactam antibiotics, our method leads to an increase of about 40% in explicative power as compared with approaches neglecting epistasis. We find that the informative sequence context extends to residues at native distances of about 20 Å from the mutated site, reaching thus far beyond residues in direct physical contact.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- UPMC, Institut de Calcul et de la Simulation, Sorbonne Universités, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| | - Hervé Jacquier
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France Service de Bactériologie-Virologie, Groupe Hospitalier Lariboisiére-Fernand Widal, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute for Technology, Eggenstein-Leopoldshafen, Germany
| | - Oliver Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France
| | - Martin Weigt
- Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| |
Collapse
|
125
|
Márquez-Chamorro AE, Asencio-Cortés G, Santiesteban-Toca CE, Aguilar-Ruiz JS. Soft computing methods for the prediction of protein tertiary structures: A survey. Appl Soft Comput 2015. [DOI: 10.1016/j.asoc.2015.06.024] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
126
|
De Leonardis E, Lutz B, Ratz S, Cocco S, Monasson R, Schug A, Weigt M. Direct-Coupling Analysis of nucleotide coevolution facilitates RNA secondary and tertiary structure prediction. Nucleic Acids Res 2015; 43:10444-55. [PMID: 26420827 PMCID: PMC4666395 DOI: 10.1093/nar/gkv932] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/02/2015] [Accepted: 09/07/2015] [Indexed: 12/16/2022] Open
Abstract
Despite the biological importance of non-coding RNA, their structural characterization remains challenging. Making use of the rapidly growing sequence databases, we analyze nucleotide coevolution across homologous sequences via Direct-Coupling Analysis to detect nucleotide-nucleotide contacts. For a representative set of riboswitches, we show that the results of Direct-Coupling Analysis in combination with a generalized Nussinov algorithm systematically improve the results of RNA secondary structure prediction beyond traditional covariance approaches based on mutual information. Even more importantly, we show that the results of Direct-Coupling Analysis are enriched in tertiary structure contacts. By integrating these predictions into molecular modeling tools, systematically improved tertiary structure predictions can be obtained, as compared to using secondary structure information alone.
Collapse
Affiliation(s)
- Eleonora De Leonardis
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Benjamin Lutz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Sebastian Ratz
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany Fakultät für Physik, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Rémi Monasson
- Laboratoire de Physique Théorique de l'Ecole Normale Supérieure, associé au CNRS et à l'Université Pierre et Marie Curie, 75005 Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruher Institut für Technologie, 76133 Karlsruhe, Germany
| | - Martin Weigt
- Computational and Quantitative Biology, Sorbonne Universités, Université Pierre et Marie Curie, UMR 7238, 75006 Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, 75006 Paris, France
| |
Collapse
|
127
|
Dabrowski-Tumanski P, Jarmolinska AI, Sulkowska JI. Prediction of the optimal set of contacts to fold the smallest knotted protein. JOURNAL OF PHYSICS. CONDENSED MATTER : AN INSTITUTE OF PHYSICS JOURNAL 2015; 27:354109. [PMID: 26291339 DOI: 10.1088/0953-8984/27/35/354109] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Knotted protein chains represent a new motif in protein folds. They have been linked to various diseases, and recent extensive analysis of the Protein Data Bank shows that they constitute 1.5% of all deposited protein structures. Despite thorough theoretical and experimental investigations, the role of knots in proteins still remains elusive. Nonetheless, it is believed that knots play an important role in mechanical and thermal stability of proteins. Here, we perform a comprehensive analysis of native, shadow-specific and non-native interactions which describe free energy landscape of the smallest knotted protein (PDB id 2efv). We show that the addition of shadow-specific contacts in the loop region greatly enhances folding kinetics, while the addition of shadow-specific contacts along the C-terminal region (H3 or H4) results in a new folding route with slower kinetics. By means of direct coupling analysis (DCA) we predict non-native contacts which also can accelerate kinetics. Next, we show that the length of the C-terminal knot tail is responsible for the shape of the free energy barrier, while the influence of the elongation of the N-terminus is not significant. Finally, we develop a concept of a minimal contact map sufficient for 2efv protein to fold and analyze properties of this protein using this map.
Collapse
Affiliation(s)
- P Dabrowski-Tumanski
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097 Warsaw, Poland. Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | | | | |
Collapse
|
128
|
dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 2015; 5:13652. [PMID: 26338201 PMCID: PMC4559900 DOI: 10.1038/srep13652] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 07/13/2015] [Indexed: 11/09/2022] Open
Abstract
We develop a procedure to characterize the association of protein structures into homodimers using coevolutionary couplings extracted from Direct Coupling Analysis (DCA) in combination with Structure Based Models (SBM). Identification of dimerization contacts using DCA is more challenging than intradomain contacts since direct couplings are mixed with monomeric contacts. Therefore a systematic way to extract dimerization signals has been elusive. We provide evidence that the prediction of homodimeric complexes is possible with high accuracy for all the cases we studied which have rich sequence information. For the most accurate conformations of the structurally diverse dimeric complexes studied the mean and interfacial RMSDs are 1.95Å and 1.44Å, respectively. This methodology is also able to identify distinct dimerization conformations as for the case of the family of response regulators, which dimerize upon activation. The identification of dimeric complexes can provide interesting molecular insights in the construction of large oligomeric complexes and be useful in the study of aggregation related diseases like Alzheimer's or Parkinson's.
Collapse
Affiliation(s)
- Ricardo N. dos Santos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| | - Biman Jana
- Department of Physical Chemistry, Indian Association for the Cultivation of Science, Jadavpur, Kolkata-700032, India
| | - Adriano D. Andricopulo
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| |
Collapse
|
129
|
Ovchinnikov S, Kinch L, Park H, Liao Y, Pei J, Kim DE, Kamisetty H, Grishin NV, Baker D. Large-scale determination of previously unsolved protein structures using evolutionary information. eLife 2015; 4:e09248. [PMID: 26335199 PMCID: PMC4602095 DOI: 10.7554/elife.09248] [Citation(s) in RCA: 176] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2015] [Accepted: 08/30/2015] [Indexed: 12/18/2022] Open
Abstract
The prediction of the structures of proteins without detectable sequence similarity to any protein of known structure remains an outstanding scientific challenge. Here we report significant progress in this area. We first describe de novo blind structure predictions of unprecendented accuracy we made for two proteins in large families in the recent CASP11 blind test of protein structure prediction methods by incorporating residue-residue co-evolution information in the Rosetta structure prediction program. We then describe the use of this method to generate structure models for 58 of the 121 large protein families in prokaryotes for which three-dimensional structures are not available. These models, which are posted online for public access, provide structural information for the over 400,000 proteins belonging to the 58 families and suggest hypotheses about mechanism for the subset for which the function is known, and hypotheses about function for the remainder.
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, University of Washington, Seattle, United States
| | - Lisa Kinch
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
| | - Hahnbeom Park
- Department of Biochemistry, University of Washington, Seattle, United States
| | - Yuxing Liao
- Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
| | - Jimin Pei
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
| | - David E Kim
- Department of Biochemistry, University of Washington, Seattle, United States
| | | | - Nick V Grishin
- Howard Hughes Medical Institute, University of Texas Southwestern Medical Center, Dallas, United States
- Department of Biophysics, Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, United States
| | - David Baker
- Department of Biochemistry, University of Washington, Seattle, United States
- Howard Hughes Medical Institute, University of Washington, Seattle, United States
| |
Collapse
|
130
|
Abstract
The activity of a neural network is defined by patterns of spiking and silence from the individual neurons. Because spikes are (relatively) sparse, patterns of activity with increasing numbers of spikes are less probable, but, with more spikes, the number of possible patterns increases. This tradeoff between probability and numerosity is mathematically equivalent to the relationship between entropy and energy in statistical physics. We construct this relationship for populations of up to N = 160 neurons in a small patch of the vertebrate retina, using a combination of direct and model-based analyses of experiments on the response of this network to naturalistic movies. We see signs of a thermodynamic limit, where the entropy per neuron approaches a smooth function of the energy per neuron as N increases. The form of this function corresponds to the distribution of activity being poised near an unusual kind of critical point. We suggest further tests of criticality, and give a brief discussion of its functional significance.
Collapse
|
131
|
Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU. Capturing coevolutionary signals inrepeat proteins. BMC Bioinformatics 2015; 16:207. [PMID: 26134293 PMCID: PMC4489039 DOI: 10.1186/s12859-015-0648-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 06/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging. RESULTS We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. CONCLUSIONS The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.
Collapse
Affiliation(s)
- Rocío Espada
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina.,Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - R Gonzalo Parra
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Thierry Mora
- Laboratoire de physique statistique, CNRS, UPMC and École normale supérieure, 24 rue Lhomond, Paris, 75005, France
| | | | - Diego U Ferreiro
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| |
Collapse
|
132
|
Tang Y, Huang YJ, Hopf TA, Sander C, Marks DS, Montelione GT. Protein structure determination by combining sparse NMR data with evolutionary couplings. Nat Methods 2015; 12:751-4. [PMID: 26121406 PMCID: PMC4521990 DOI: 10.1038/nmeth.3455] [Citation(s) in RCA: 54] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 05/26/2015] [Indexed: 11/13/2022]
Abstract
Accurate protein structure determination by NMR is challenging for larger proteins, for which experimental data is often incomplete and ambiguous. Fortunately, the upsurge in evolutionary sequence information and advances in maximum entropy statistical methods now provide a rich complementary source of structural constraints. We have developed a hybrid approach (EC-NMR) combining sparse NMR data with evolutionary residue-residue couplings, and demonstrate accurate structure determination for several 6 to 41 kDa proteins.
Collapse
Affiliation(s)
- Yuefeng Tang
- 1] Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [2] Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Yuanpeng Janet Huang
- 1] Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [2] Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| | - Thomas A Hopf
- 1] Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA. [2] Department of Informatics, Technische Universität München, Garching, Germany
| | - Chris Sander
- Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, New York, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, Massachusetts, USA
| | - Gaetano T Montelione
- 1] Center for Advanced Biotechnology and Medicine, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [2] Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA. [3] Department of Biochemistry and Molecular Biology, Robert Wood Johnson Medical School, Rutgers, The State University of New Jersey, Piscataway, New Jersey, USA
| |
Collapse
|
133
|
Flynn WF, Chang MW, Tan Z, Oliveira G, Yuan J, Okulicz JF, Torbett BE, Levy RM. Deep sequencing of protease inhibitor resistant HIV patient isolates reveals patterns of correlated mutations in Gag and protease. PLoS Comput Biol 2015; 11:e1004249. [PMID: 25894830 PMCID: PMC4404092 DOI: 10.1371/journal.pcbi.1004249] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/13/2014] [Accepted: 03/19/2015] [Indexed: 11/18/2022] Open
Abstract
While the role of drug resistance mutations in HIV protease has been studied comprehensively, mutations in its substrate, Gag, have not been extensively cataloged. Using deep sequencing, we analyzed a unique collection of longitudinal viral samples from 93 patients who have been treated with therapies containing protease inhibitors (PIs). Due to the high sequence coverage within each sample, the frequencies of mutations at individual positions were calculated with high precision. We used this information to characterize the variability in the Gag polyprotein and its effects on PI-therapy outcomes. To examine covariation of mutations between two different sites using deep sequencing data, we developed an approach to estimate the tight bounds on the two-site bivariate probabilities in each viral sample, and the mutual information between pairs of positions based on all the bounds. Utilizing the new methodology we found that mutations in the matrix and p6 proteins contribute to continued therapy failure and have a major role in the network of strongly correlated mutations in the Gag polyprotein, as well as between Gag and protease. Although covariation is not direct evidence of structural propensities, we found the strongest correlations between residues on capsid and matrix of the same Gag protein were often due to structural proximity. This suggests that some of the strongest inter-protein Gag correlations are the result of structural proximity. Moreover, the strong covariation between residues in matrix and capsid at the N-terminus with p1 and p6 at the C-terminus is consistent with residue-residue contacts between these proteins at some point in the viral life cycle. Understanding the structure of HIV proteins and the function of drug-resistant mutations of these proteins is critical for the development of effective HIV treatments. Selected gag mutations have been shown to provide compensatory functions for protease resistance mutations and may directly contribute to the development of drug resistance. To determine associations between protease inhibitor mutations and gag, we utilized deep sequencing of HIV gag and protease from a collection of viral isolates from patients treated with highly active retroviral protease inhibitors. Deep sequencing allows for accurate measurement of mutation frequencies at each position, allowing estimation, using a novel method we developed, of the covariation between any two residues on gag. Using this information, we characterize the variation within gag and protease and identify the most strongly correlated pairs of inter- and intra-protein residues. Our results suggest that matrix and p1/p6 mutations form the core of a network of strongly correlated gag mutations and contribute to recurrent treatment failure. Extracting gag residue covariation information from the deep sequencing of patient viral samples may provide insight into structural aspects of the Gag polyprotein as well new areas for small molecule targeting to disrupt Gag function.
Collapse
Affiliation(s)
- William F. Flynn
- Department of Physics and Astronomy, Rutgers University, Piscataway, New Jersey, United States of America
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania, United States of America
| | - Max W. Chang
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Zhiqiang Tan
- Department of Statistics, Rutgers University, Piscataway, New Jersey, United States of America
| | - Glenn Oliveira
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Jinyun Yuan
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
| | - Jason F. Okulicz
- Infectious Disease Service, San Antonio Military Medical Center, San Antonio, Texas, United States of America
| | - Bruce E. Torbett
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, California, United States of America
- * E-mail: (BET); (RML)
| | - Ronald M. Levy
- Center for Biophysics and Computational Biology, Temple University, Philadelphia, Pennsylvania, United States of America
- Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania, United States of America
- * E-mail: (BET); (RML)
| |
Collapse
|
134
|
Zhao Y, Wang Y, Gao Y, Li G, Huang J. Integrated analysis of residue coevolution and protein structures capture key protein sectors in HIV-1 proteins. PLoS One 2015; 10:e0117506. [PMID: 25671429 PMCID: PMC4324911 DOI: 10.1371/journal.pone.0117506] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2014] [Accepted: 12/24/2014] [Indexed: 02/07/2023] Open
Abstract
HIV type 1 (HIV-1) is characterized by its rapid genetic evolution, leading to challenges in anti-HIV therapy. However, the sequence variations in HIV-1 proteins are not randomly distributed due to a combination of functional constraints and genetic drift. In this study, we examined patterns of sequence variability for evidence of linked sequence changes (termed as coevolution or covariation) in 15 HIV-1 proteins. It shows that the percentage of charged residues in the coevolving residues is significantly higher than that in all the HIV-1 proteins. Most of the coevolving residues are spatially proximal in the protein structures and tend to form relatively compact and independent units in the tertiary structures, termed as "protein sectors". These protein sectors are closely associated with anti-HIV drug resistance, T cell epitopes, and antibody binding sites. Finally, we explored candidate peptide inhibitors based on the protein sectors. Our results can establish an association between the coevolving residues and molecular functions of HIV-1 proteins, and then provide us with valuable knowledge of pathology of HIV-1 and therapeutics development.
Collapse
Affiliation(s)
- Yuqi Zhao
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, No.32 Jiaochang Donglu Kunming, 650223 Yunnan, China
- Department of Integrative Biology and Physiology, University of California, Los Angeles, 610 Charles E. Young Drive East, Los Angeles, California, United States of America
- * E-mail: (YZ); (JH)
| | - Yanjie Wang
- Key Laboratory of Animal Models and Human Disease Mechanisms of Chinese Academy of Sciences and Yunnan Province, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming, Yunnan 650223, China
| | - Yuedong Gao
- Kunming Biological Diversity Regional Center of Instruments, Kunming Institute of Zoology, Chinese Academy of Sciences, Kunming 650223, China
| | - Gonghua Li
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, No.32 Jiaochang Donglu Kunming, 650223 Yunnan, China
| | - Jingfei Huang
- State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences, No.32 Jiaochang Donglu Kunming, 650223 Yunnan, China
- Collaborative Innovation Center for Natural Products and Biological Drugs of Yunnan, Kunming, Yunnan 650223, China
- * E-mail: (YZ); (JH)
| |
Collapse
|
135
|
Tamir S, Paddock ML, Darash-Yahana-Baram M, Holt SH, Sohn YS, Agranat L, Michaeli D, Stofleth JT, Lipper CH, Morcos F, Cabantchik IZ, Onuchic JN, Jennings PA, Mittler R, Nechushtai R. Structure-function analysis of NEET proteins uncovers their role as key regulators of iron and ROS homeostasis in health and disease. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2014; 1853:1294-315. [PMID: 25448035 DOI: 10.1016/j.bbamcr.2014.10.014] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 10/01/2014] [Accepted: 10/16/2014] [Indexed: 12/31/2022]
Abstract
A novel family of 2Fe-2S proteins, the NEET family, was discovered during the last decade in numerous organisms, including archea, bacteria, algae, plant and human; suggesting an evolutionary-conserved function, potentially mediated by their CDGSH Iron-Sulfur Domain. In human, three NEET members encoded by the CISD1-3 genes were identified. The structures of CISD1 (mitoNEET, mNT), CISD2 (NAF-1), and the plant At-NEET uncovered a homodimer with a unique "NEET fold", as well as two distinct domains: a beta-cap and a 2Fe-2S cluster-binding domain. The 2Fe-2S clusters of NEET proteins were found to be coordinated by a novel 3Cys:1His structure that is relatively labile compared to other 2Fe-2S proteins and is the reason of the NEETs' clusters could be transferred to apo-acceptor protein(s) or mitochondria. Positioned at the protein surface, the NEET's 2Fe-2S's coordinating His is exposed to protonation upon changes in its environment, potentially suggesting a sensing function for this residue. Studies in different model systems demonstrated a role for NAF-1 and mNT in the regulation of cellular iron, calcium and ROS homeostasis, and uncovered a key role for NEET proteins in critical processes, such as cancer cell proliferation and tumor growth, lipid and glucose homeostasis in obesity and diabetes, control of autophagy, longevity in mice, and senescence in plants. Abnormal regulation of NEET proteins was consequently found to result in multiple health conditions, and aberrant splicing of NAF-1 was found to be a causative of the neurological genetic disorder Wolfram Syndrome 2. Here we review the discovery of NEET proteins, their structural, biochemical and biophysical characterization, and their most recent structure-function analyses. We additionally highlight future avenues of research focused on NEET proteins and propose an essential role for NEETs in health and disease. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases.
Collapse
Affiliation(s)
- Sagi Tamir
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Mark L Paddock
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Merav Darash-Yahana-Baram
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Sarah H Holt
- Department of Biology, University of North Texas, Denton, TX 76203, USA
| | - Yang Sung Sohn
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Lily Agranat
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Dorit Michaeli
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Jason T Stofleth
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Colin H Lipper
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77050, USA; Department of Physics and Astronomy, Rice University, Houston, TX 77050, USA; Department of Chemistry, Rice University, Houston, TX 77050, USA; Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77050, USA
| | - Ioav Z Cabantchik
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Jose' N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77050, USA; Department of Physics and Astronomy, Rice University, Houston, TX 77050, USA; Department of Chemistry, Rice University, Houston, TX 77050, USA; Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77050, USA
| | - Patricia A Jennings
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Ron Mittler
- Department of Biology, University of North Texas, Denton, TX 76203, USA
| | - Rachel Nechushtai
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel.
| |
Collapse
|
136
|
Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLoS Comput Biol 2014; 10:e1003847. [PMID: 25299132 PMCID: PMC4191875 DOI: 10.1371/journal.pcbi.1003847] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2014] [Accepted: 08/07/2014] [Indexed: 11/18/2022] Open
Abstract
Correlation patterns in multiple sequence alignments of homologous proteins can be exploited to infer information on the three-dimensional structure of their members. The typical pipeline to address this task, which we in this paper refer to as the three dimensions of contact prediction, is to (i) filter and align the raw sequence data representing the evolutionarily related proteins; (ii) choose a predictive model to describe a sequence alignment; (iii) infer the model parameters and interpret them in terms of structural properties, such as an accurate contact map. We show here that all three dimensions are important for overall prediction success. In particular, we show that it is possible to improve significantly along the second dimension by going beyond the pair-wise Potts models from statistical physics, which have hitherto been the focus of the field. These (simple) extensions are motivated by multiple sequence alignments often containing long stretches of gaps which, as a data feature, would be rather untypical for independent samples drawn from a Potts model. Using a large test set of proteins we show that the combined improvements along the three dimensions are as large as any reported to date. Proteins are large molecules that living cells make by stringing together building blocks called amino acids or peptides, following their blue-prints in the DNA. Freshly made proteins are typically long, structure-less chains of peptides, but shortly afterwards most of them fold into characteristic structures. Proteins execute many functions in the cell, for which they need to have the right structure, which is therefore very important in determining what the proteins can do. The structure of a protein can be determined by X-ray diffraction and other experimental approaches which are all, to this day, somewhat labor-intensive and difficult. On the other hand, the order of the peptides in a protein can be read off from the DNA blue-print, and such protein sequences are today routinely produced in large numbers. In this paper we show that many similar protein sequences can be used to find information about the structure. The basic approach is to construct a probabilistic model for sequence variability, and then to use the parameters of that model to predict structure in three-dimensional space. The main technical novelty compared to previous contributions in the same general direction is that we use models more directly matched to the data.
Collapse
Affiliation(s)
- Christoph Feinauer
- DISAT and Center for Computational Sciences, Politecnico Torino, Torino, Italy
| | - Marcin J. Skwark
- Department of Information and Computer Science, Aalto University, Aalto, Finland
- Aalto Science Institute (AScI), Aalto University, Aalto, Finland
| | - Andrea Pagnani
- DISAT and Center for Computational Sciences, Politecnico Torino, Torino, Italy
- Human Genetics Foundation-Torino, Molecular Biotechnology Center, Torino, Italy
| | - Erik Aurell
- Department of Information and Computer Science, Aalto University, Aalto, Finland
- Aalto Science Institute (AScI), Aalto University, Aalto, Finland
- Department of Computational Biology, Royal Institute of Technology, AlbaNova University Centre, Stockholm, Sweden
- * E-mail:
| |
Collapse
|
137
|
Castellana M, Bialek W. Inverse spin glass and related maximum entropy problems. PHYSICAL REVIEW LETTERS 2014; 113:117204. [PMID: 25260004 DOI: 10.1103/physrevlett.113.117204] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 12/30/2013] [Indexed: 06/03/2023]
Abstract
If we have a system of binary variables and we measure the pairwise correlations among these variables, then the least structured or maximum entropy model for their joint distribution is an Ising model with pairwise interactions among the spins. Here we consider inhomogeneous systems in which we constrain, for example, not the full matrix of correlations, but only the distribution from which these correlations are drawn. In this sense, what we have constructed is an inverse spin glass: rather than choosing coupling constants at random from a distribution and calculating correlations, we choose the correlations from a distribution and infer the coupling constants. We argue that such models generate a block structure in the space of couplings, which provides an explicit solution of the inverse problem. This allows us to generate a phase diagram in the space of (measurable) moments of the distribution of correlations. We expect that these ideas will be most useful in building models for systems that are nonequilibrium statistical mechanics problems, such as networks of real neurons.
Collapse
Affiliation(s)
- Michele Castellana
- Joseph Henry Laboratories of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA
| | - William Bialek
- Joseph Henry Laboratories of Physics and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey 08544, USA and Initiative for the Theoretical Sciences, The Graduate Center, City University of New York, 365 Fifth Avenue, New York, New York 10016, USA
| |
Collapse
|
138
|
Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014; 30:i482-8. [PMID: 25161237 PMCID: PMC4147911 DOI: 10.1093/bioinformatics/btu458] [Citation(s) in RCA: 85] [Impact Index Per Article: 8.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used. RESULTS In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved. AVAILABILITY PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mirco Michel
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Sikander Hayat
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Marcin J Skwark
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Chris Sander
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Debora S Marks
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| | - Arne Elofsson
- Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA
| |
Collapse
|
139
|
Coevolutionary information, protein folding landscapes, and the thermodynamics of natural selection. Proc Natl Acad Sci U S A 2014; 111:12408-13. [PMID: 25114242 DOI: 10.1073/pnas.1413575111] [Citation(s) in RCA: 111] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
The energy landscape used by nature over evolutionary timescales to select protein sequences is essentially the same as the one that folds these sequences into functioning proteins, sometimes in microseconds. We show that genomic data, physical coarse-grained free energy functions, and family-specific information theoretic models can be combined to give consistent estimates of energy landscape characteristics of natural proteins. One such characteristic is the effective temperature T(sel) at which these foldable sequences have been selected in sequence space by evolution. T(sel) quantifies the importance of folded-state energetics and structural specificity for molecular evolution. Across all protein families studied, our estimates for T(sel) are well below the experimental folding temperatures, indicating that the energy landscapes of natural foldable proteins are strongly funneled toward the native state.
Collapse
|
140
|
Andreani J, Guerois R. Evolution of protein interactions: From interactomes to interfaces. Arch Biochem Biophys 2014; 554:65-75. [DOI: 10.1016/j.abb.2014.05.010] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Revised: 04/28/2014] [Accepted: 05/12/2014] [Indexed: 12/16/2022]
|
141
|
Sinner C, Lutz B, John S, Reinartz I, Verma A, Schug A. Simulating Biomolecular Folding and Function by Native-Structure-Based/Go-Type Models. Isr J Chem 2014. [DOI: 10.1002/ijch.201400012] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
|
142
|
Lua RC, Marciano DC, Katsonis P, Adikesavan AK, Wilkins AD, Lichtarge O. Prediction and redesign of protein-protein interactions. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:194-202. [PMID: 24878423 DOI: 10.1016/j.pbiomolbio.2014.05.004] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/25/2014] [Revised: 05/02/2014] [Accepted: 05/17/2014] [Indexed: 12/14/2022]
Abstract
Understanding the molecular basis of protein function remains a central goal of biology, with the hope to elucidate the role of human genes in health and in disease, and to rationally design therapies through targeted molecular perturbations. We review here some of the computational techniques and resources available for characterizing a critical aspect of protein function - those mediated by protein-protein interactions (PPI). We describe several applications and recent successes of the Evolutionary Trace (ET) in identifying molecular events and shapes that underlie protein function and specificity in both eukaryotes and prokaryotes. ET is a part of analytical approaches based on the successes and failures of evolution that enable the rational control of PPI.
Collapse
Affiliation(s)
- Rhonald C Lua
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - David C Marciano
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Panagiotis Katsonis
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Anbu K Adikesavan
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA
| | - Angela D Wilkins
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA
| | - Olivier Lichtarge
- Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX 77030, USA; Verna and Marrs McLean Department of Biochemistry and Molecular Biology, Baylor College of Medicine, Houston, TX 77030, USA; Computational and Integrative Biomedical Research Center, Baylor College of Medicine, Houston, TX 77030, USA.
| |
Collapse
|
143
|
Ovchinnikov S, Kamisetty H, Baker D. Robust and accurate prediction of residue-residue interactions across protein interfaces using evolutionary information. eLife 2014; 3:e02030. [PMID: 24842992 PMCID: PMC4034769 DOI: 10.7554/elife.02030] [Citation(s) in RCA: 446] [Impact Index Per Article: 44.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Do the amino acid sequence identities of residues that make contact across protein interfaces covary during evolution? If so, such covariance could be used to predict contacts across interfaces and assemble models of biological complexes. We find that residue pairs identified using a pseudo-likelihood-based method to covary across protein–protein interfaces in the 50S ribosomal unit and 28 additional bacterial protein complexes with known structure are almost always in contact in the complex, provided that the number of aligned sequences is greater than the average length of the two proteins. We use this method to make subunit contact predictions for an additional 36 protein complexes with unknown structures, and present models based on these predictions for the tripartite ATP-independent periplasmic (TRAP) transporter, the tripartite efflux system, the pyruvate formate lyase-activating enzyme complex, and the methionine ABC transporter. DOI:http://dx.doi.org/10.7554/eLife.02030.001 Proteins are considered the ‘workhorse molecules’ of life and they are involved in virtually everything that cells do. Proteins are strings of amino acids that have folded into a specific three-dimensional shape. Proteins must have the correct shape to function properly, as they often work by binding to other proteins or molecules—much like a key fitting into a lock. Working out the structure of a protein can, therefore, provide major insights into how the protein does its job. Two or more proteins can bind together and form a complex to perform various tasks; and solving the structures of these complexes can be challenging, even if the structures of the protein subunits are known. Now, Ovchinnikov, Kamisetty, and Baker have developed a method for predicting which parts of the proteins make contact with each other in a two-protein complex. Different species can have copies of the same proteins; but a copy from one species might have different amino acids at certain positions when compared to a related copy from another species. As such, when pairs of interacting proteins from different species are compared, there will be many positions in the two proteins that vary. However, if the amino acid at a position in one protein (let's call it ‘X’) varies, and the amino acid at, say, position ‘Y’ in the other protein also varies such that for any given amino acid at position Y there is often a specific amino acid at position X; positions X and Y are said to ‘co-vary’. Ovchinnikov et al. noticed that when a pair of amino acids (one from each protein in a two-protein complex) co-varied, these two amino acids tended to make contact with each other at the protein–protein interface. Ovchinnikov et al. used the new method to make predictions about the protein–protein interfaces in 28 protein complexes found in bacteria, and also to make a prediction about the interface between protein subunits in the bacterial ribosome. When these predictions were checked against the actual structures, which were all known beforehand, they were found to be accurate if the number of copies of each protein being compared is greater than the average length of the two proteins. Ovchinnikov et al. went on to predict the amino acids on the protein–protein interfaces for another 36 bacterial protein complexes with unknown structures, and to present models for four larger complexes. The next challenge is to extend the method to protein complexes that are found only in eukaryotes (i.e., not in bacteria). Since the number of related copies for eukaryotic proteins tends to be smaller, there are fewer proteins to compare and it is therefore harder to detect ‘covariation’ when it occurs. DOI:http://dx.doi.org/10.7554/eLife.02030.002
Collapse
Affiliation(s)
- Sergey Ovchinnikov
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, United States Molecular and Cellular Biology Program, University of Washington, Seattle, United States
| | - Hetunandan Kamisetty
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, United States Facebook Inc., Seattle, United States
| | - David Baker
- Department of Biochemistry, Howard Hughes Medical Institute, University of Washington, Seattle, United States
| |
Collapse
|
144
|
Abstract
By focusing on essential features, while averaging over less important details, coarse-grained (CG) models provide significant computational and conceptual advantages with respect to more detailed models. Consequently, despite dramatic advances in computational methodologies and resources, CG models enjoy surging popularity and are becoming increasingly equal partners to atomically detailed models. This perspective surveys the rapidly developing landscape of CG models for biomolecular systems. In particular, this review seeks to provide a balanced, coherent, and unified presentation of several distinct approaches for developing CG models, including top-down, network-based, native-centric, knowledge-based, and bottom-up modeling strategies. The review summarizes their basic philosophies, theoretical foundations, typical applications, and recent developments. Additionally, the review identifies fundamental inter-relationships among the diverse approaches and discusses outstanding challenges in the field. When carefully applied and assessed, current CG models provide highly efficient means for investigating the biological consequences of basic physicochemical principles. Moreover, rigorous bottom-up approaches hold great promise for further improving the accuracy and scope of CG models for biomolecular systems.
Collapse
Affiliation(s)
- W G Noid
- Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, USA
| |
Collapse
|
145
|
Baldassi C, Zamparo M, Feinauer C, Procaccini A, Zecchina R, Weigt M, Pagnani A. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014; 9:e92721. [PMID: 24663061 PMCID: PMC3963956 DOI: 10.1371/journal.pone.0092721] [Citation(s) in RCA: 89] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/05/2013] [Accepted: 02/24/2014] [Indexed: 11/18/2022] Open
Abstract
In the course of evolution, proteins show a remarkable conservation of their three-dimensional structure and their biological function, leading to strong evolutionary constraints on the sequence variability between homologous proteins. Our method aims at extracting such constraints from rapidly accumulating sequence data, and thereby at inferring protein structure and function from sequence information alone. Recently, global statistical inference methods (e.g. direct-coupling analysis, sparse inverse covariance estimation) have achieved a breakthrough towards this aim, and their predictions have been successfully implemented into tertiary and quaternary protein structure prediction methods. However, due to the discrete nature of the underlying variable (amino-acids), exact inference requires exponential time in the protein length, and efficient approximations are needed for practical applicability. Here we propose a very efficient multivariate Gaussian modeling approach as a variant of direct-coupling analysis: the discrete amino-acid variables are replaced by continuous Gaussian random variables. The resulting statistical inference problem is efficiently and exactly solvable. We show that the quality of inference is comparable or superior to the one achieved by mean-field approximations to inference with discrete variables, as done by direct-coupling analysis. This is true for (i) the prediction of residue-residue contacts in proteins, and (ii) the identification of protein-protein interaction partner in bacterial signal transduction. An implementation of our multivariate Gaussian approach is available at the website http://areeweb.polito.it/ricerca/cmp/code.
Collapse
Affiliation(s)
- Carlo Baldassi
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
| | - Marco Zamparo
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
| | - Christoph Feinauer
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
| | | | - Riccardo Zecchina
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
| | - Martin Weigt
- Sorbonne Universités, Université Pierre et Marie Curie Paris 06, UMR 7238, Computational and Quantitative Biology, Paris, France
- Centre National de la Recherche Scientifique, UMR 7238, Computational and Quantitative Biology, Paris, France
| | - Andrea Pagnani
- Department of Applied Science and Technology and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation-Torino, Torino, Italy
- * E-mail:
| |
Collapse
|
146
|
Kosciolek T, Jones DT. De novo structure prediction of globular proteins aided by sequence variation-derived contacts. PLoS One 2014; 9:e92197. [PMID: 24637808 PMCID: PMC3956894 DOI: 10.1371/journal.pone.0092197] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2013] [Accepted: 02/19/2014] [Indexed: 12/21/2022] Open
Abstract
The advent of high accuracy residue-residue intra-protein contact prediction methods enabled a significant boost in the quality of de novo structure predictions. Here, we investigate the potential benefits of combining a well-established fragment-based folding algorithm--FRAGFOLD, with PSICOV, a contact prediction method which uses sparse inverse covariance estimation to identify co-varying sites in multiple sequence alignments. Using a comprehensive set of 150 diverse globular target proteins, up to 266 amino acids in length, we are able to address the effectiveness and some limitations of such approaches to globular proteins in practice. Overall we find that using fragment assembly with both statistical potentials and predicted contacts is significantly better than either statistical potentials or contacts alone. Results show up to nearly 80% of correct predictions (TM-score ≥0.5) within analysed dataset and a mean TM-score of 0.54. Unsuccessful modelling cases emerged either from conformational sampling problems, or insufficient contact prediction accuracy. Nevertheless, a strong dependency of the quality of final models on the fraction of satisfied predicted long-range contacts was observed. This not only highlights the importance of these contacts on determining the protein fold, but also (combined with other ensemble-derived qualities) provides a powerful guide as to the choice of correct models and the global quality of the selected model. A proposed quality assessment scoring function achieves 0.93 precision and 0.77 recall for the discrimination of correct folds on our dataset of decoys. These findings suggest the approach is well-suited for blind predictions on a variety of globular proteins of unknown 3D structure, provided that enough homologous sequences are available to construct a large and accurate multiple sequence alignment for the initial contact prediction step.
Collapse
Affiliation(s)
- Tomasz Kosciolek
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| | - David T. Jones
- Bioinformatics Group, Department of Computer Science, University College London, London, United Kingdom
- Institute of Structural and Molecular Biology, University College London, London, United Kingdom
| |
Collapse
|
147
|
Jana B, Morcos F, Onuchic JN. From structure to function: the convergence of structure based models and co-evolutionary information. Phys Chem Chem Phys 2014; 16:6496-507. [PMID: 24603809 DOI: 10.1039/c3cp55275f] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Understanding protein folding and function is one of the most important problems in biological research. Energy landscape theory and the folding funnel concept have provided a framework to investigate the mechanisms associated to these processes. Since protein energy landscapes are in most cases minimally frustrated, structure based models (SMBs) have successfully determined the geometrical features associated with folding and functional transitions. However, structural information is limited, particularly with respect to different functional configurations. This is a major limitation for SBMs. Alternatively, statistical methods to study amino acid co-evolution provide information on residue-residue interactions useful for the study of structure and function. Here, we show how the combination of these two methods gives rise to a novel way to investigate the mechanisms associated with folding and function. We use this methodology to explore the mechanistic aspects of protein translocation in the integral membrane protease FtsH. Dual basin-SBM simulations using the open and closed state of this hexameric motor reveals a functionally important paddling motion in the catalytic cycle. We also find that Direct Coupling Analysis (DCA) predicts physical contacts between AAA and peptidase domains of the motor, which are crucial for the open to close transition. Our combined method, which uses structural information from the open state experimental structure and co-evolutionary couplings, suggests that this methodology can be used to explore the functional landscape of complex biological macromolecules previously inaccessible to methods dependent on experimental structural information. This efficient way to sample the conformational space of large systems creates a theoretical/computational framework capable of better characterizing the functional landscape in large biomolecular assemblies.
Collapse
Affiliation(s)
- Biman Jana
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827, USA.
| | | | | |
Collapse
|
148
|
Sandler I, Zigdon N, Levy E, Aharoni A. The functional importance of co-evolving residues in proteins. Cell Mol Life Sci 2014; 71:673-82. [PMID: 23995987 PMCID: PMC11113390 DOI: 10.1007/s00018-013-1458-2] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/12/2013] [Revised: 07/26/2013] [Accepted: 08/13/2013] [Indexed: 10/26/2022]
Abstract
Computational approaches for detecting co-evolution in proteins allow for the identification of protein-protein interaction networks in different organisms and the assignment of function to under-explored proteins. The detection of co-variation of amino acids within or between proteins, moreover, allows for the discovery of residue-residue contacts and highlights functional residues that can affect the binding affinity, catalytic activity, or substrate specificity of a protein. To explore the functional impact of co-evolutionary changes in proteins, a combined experimental and computational approach must be recruited. Here, we review recent studies that apply computational and experimental tools to obtain novel insight into the structure, function, and evolution of proteins. Specifically, we describe the application of co-evolutionary analysis for predicting high-resolution three-dimensional structures of proteins. In addition, we describe computational approaches followed by experimental analysis for identifying specificity-determining residues in proteins. Finally, we discuss studies addressing the importance of such residues in terms of the functional divergence of proteins, allowing proteins to evolve new functions while avoiding crosstalk with existing cellular pathways or forming reproductive barriers and hence promoting speciation.
Collapse
Affiliation(s)
- Inga Sandler
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| | - Nitzan Zigdon
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| | - Efrat Levy
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| | - Amir Aharoni
- Department of Life Sciences, Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
- National Institute for Biotechnology in the Negev (NIBN), Ben-Gurion University of the Negev, 84105 Be’er Sheva, Israel
| |
Collapse
|
149
|
Toward rationally redesigning bacterial two-component signaling systems using coevolutionary information. Proc Natl Acad Sci U S A 2014; 111:E563-71. [PMID: 24449878 DOI: 10.1073/pnas.1323734111] [Citation(s) in RCA: 94] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
A challenge in molecular biology is to distinguish the key subset of residues that allow two-component signaling (TCS) proteins to recognize their correct signaling partner such that they can transiently bind and transfer signal, i.e., phosphoryl group. Detailed knowledge of this information would allow one to search sequence space for mutations that can be used to systematically tune the signal transmission between TCS partners as well as potentially encode a TCS protein to preferentially transfer signals to a nonpartner. Motivated by the notion that this detailed information is found in sequence data, we explore the sequence coevolution between signaling partners to better understand how mutations can positively or negatively alter their ability to transfer signal. Using direct coupling analysis for determining evolutionarily conserved protein-protein interactions, we apply a metric called the direct information score to quantify mutational changes in the interaction between TCS proteins and demonstrate that it accurately correlates with experimental mutagenesis studies probing the mutational change in measured in vitro phosphotransfer. Furthermore, by subtracting from our metric an appropriate null model corresponding to generic, conserved features in TCS signaling pairs, we can isolate the determinants that give rise to interaction specificity and recognition, which are variable among different TCS partners. Our methodology forms a potential framework for the rational design of TCS systems by allowing one to quickly search sequence space for mutations or even entirely new sequences that can increase or decrease our metric, as a proxy for increasing or decreasing phosphotransfer ability between TCS proteins.
Collapse
|
150
|
Morcos F, Hwa T, Onuchic JN, Weigt M. Direct coupling analysis for protein contact prediction. Methods Mol Biol 2014; 1137:55-70. [PMID: 24573474 DOI: 10.1007/978-1-4939-0366-5_5] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
During evolution, structure, and function of proteins are remarkably conserved, whereas amino-acid sequences vary strongly between homologous proteins. Structural conservation constrains sequence variability and forces different residues to coevolve, i.e., to show correlated patterns of amino-acid occurrences. However, residue correlation may result from direct coupling, e.g., by a contact in the folded protein, or be induced indirectly via intermediate residues. To use empirically observed correlations for predicting residue-residue contacts, direct and indirect effects have to be disentangled. Here we present mechanistic details on how to achieve this using a methodology called Direct Coupling Analysis (DCA). DCA has been shown to produce highly accurate estimates of amino-acid pairs that have direct reciprocal constraints in evolution. Specifically, we provide instructions and protocols on how to use the algorithmic implementations of DCA starting from data extraction to predicted-contact visualization in contact maps or representative protein structures.
Collapse
Affiliation(s)
- Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| | | | | | | |
Collapse
|