51
|
Haldane A, Flynn WF, He P, Levy RM. Coevolutionary Landscape of Kinase Family Proteins: Sequence Probabilities and Functional Motifs. Biophys J 2019; 114:21-31. [PMID: 29320688 DOI: 10.1016/j.bpj.2017.10.028] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/20/2017] [Revised: 09/11/2017] [Accepted: 10/17/2017] [Indexed: 01/25/2023] Open
Abstract
The protein kinase catalytic domain is one of the most abundant domains across all branches of life. Although kinases share a common core function of phosphoryl-transfer, they also have wide functional diversity and play varied roles in cell signaling networks, and for this reason are implicated in a number of human diseases. This functional diversity is primarily achieved through sequence variation, and uncovering the sequence-function relationships for the kinase family is a major challenge. In this study we use a statistical inference technique inspired by statistical physics, which builds a coevolutionary "Potts" Hamiltonian model of sequence variation in a protein family. We show how this model has sufficient power to predict the probability of specific subsequences in the highly diverged kinase family, which we verify by comparing the model's predictions with experimental observations in the Uniprot database. We show that the pairwise (residue-residue) interaction terms of the statistical model are necessary and sufficient to capture higher-than-pairwise mutation patterns of natural kinase sequences. We observe that previously identified functional sets of residues have much stronger correlated interaction scores than are typical.
Collapse
Affiliation(s)
- Allan Haldane
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - William F Flynn
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania; Department of Physics and Astronomy, Rutgers, The State University of New Jersey, Piscataway, New Jersey
| | - Peng He
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania
| | - Ronald M Levy
- Center for Biophysics and Computational Biology, Department of Chemistry, and Institute for Computational Molecular Science, Temple University, Philadelphia, Pennsylvania.
| |
Collapse
|
52
|
The role of coevolutionary signatures in protein interaction dynamics, complex inference, molecular recognition, and mutational landscapes. Curr Opin Struct Biol 2019; 56:179-186. [PMID: 31029927 DOI: 10.1016/j.sbi.2019.03.024] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2019] [Revised: 03/18/2019] [Accepted: 03/19/2019] [Indexed: 11/22/2022]
Abstract
Evolution imposes constraints at the interface of interacting biomolecules in order to preserve function or maintain fitness. This pressure may have a direct effect on the sequence composition of interacting biomolecules. As a result, statistical patterns of amino acid or nucleotide covariance that encode for physical and functional interactions are observed in sequences of extant organisms. In recent years, global pairwise models of amino acid and nucleotide coevolution from multiple sequence alignments have been developed and utilized to study molecular interactions in structural biology. In proteins, for which the energy landscape is funneled and minimally frustrated, a direct connection between the physical and sequence space landscapes can be established. Estimating coevolutionary information from sequences of interacting molecules has a broad impact in molecular biology. Applications include the accurate determination of 3D structures of molecular complexes, inference of protein interaction partners, models of protein-protein interaction specificity, the elucidation, and design of protein-nucleic acid recognition as well as the discovery of genome-wide epistatic effects. The current state of the art of coevolutionary analysis includes biomedical applications ranging from mutational landscapes and drug-design to vaccine development.
Collapse
|
53
|
Wang SW, Bitbol AF, Wingreen NS. Revealing evolutionary constraints on proteins through sequence analysis. PLoS Comput Biol 2019; 15:e1007010. [PMID: 31017888 PMCID: PMC6502352 DOI: 10.1371/journal.pcbi.1007010] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2019] [Revised: 05/06/2019] [Accepted: 04/06/2019] [Indexed: 02/03/2023] Open
Abstract
Statistical analysis of alignments of large numbers of protein sequences has revealed "sectors" of collectively coevolving amino acids in several protein families. Here, we show that selection acting on any functional property of a protein, represented by an additive trait, can give rise to such a sector. As an illustration of a selected trait, we consider the elastic energy of an important conformational change within an elastic network model, and we show that selection acting on this energy leads to correlations among residues. For this concrete example and more generally, we demonstrate that the main signature of functional sectors lies in the small-eigenvalue modes of the covariance matrix of the selected sequences. However, secondary signatures of these functional sectors also exist in the extensively-studied large-eigenvalue modes. Our simple, general model leads us to propose a principled method to identify functional sectors, along with the magnitudes of mutational effects, from sequence data. We further demonstrate the robustness of these functional sectors to various forms of selection, and the robustness of our approach to the identification of multiple selected traits.
Collapse
Affiliation(s)
- Shou-Wen Wang
- Department of Engineering Physics, Tsinghua University, Beijing, China
- Beijing Computational Science Research Center, Beijing, China
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Anne-Florence Bitbol
- Sorbonne Université, CNRS, Laboratoire Jean Perrin (UMR 8237), F-75005 Paris, France
| | - Ned S. Wingreen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- Department of Molecular Biology, Princeton University, Princeton, New Jersey, United States of America
| |
Collapse
|
54
|
Jarmolinska AI, Zhou Q, Sulkowska JI, Morcos F. DCA-MOL: A PyMOL Plugin To Analyze Direct Evolutionary Couplings. J Chem Inf Model 2019; 59:625-629. [DOI: 10.1021/acs.jcim.8b00690] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
Affiliation(s)
- Aleksandra I. Jarmolinska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
- College of Inter-Faculty Individual Studies in Mathematics and Natural Sciences, Banacha 2c, 02-097 Warsaw, Poland
| | - Qin Zhou
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, United States
| | - Joanna I. Sulkowska
- Centre of New Technologies, University of Warsaw, Banacha 2c, 02-097, Warsaw, Poland
- Faculty of Chemistry, University of Warsaw, Pasteura 1, 02-093 Warsaw, Poland
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, Texas 75080, United States
- Center for Systems Biology, University of Texas at Dallas, Richardson, Texas 75080, United States
| |
Collapse
|
55
|
Coevolutionary Signals and Structure-Based Models for the Prediction of Protein Native Conformations. Methods Mol Biol 2019; 1851:83-103. [PMID: 30298393 DOI: 10.1007/978-1-4939-8736-8_5] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
The analysis of coevolutionary signals from families of evolutionarily related sequences is a recent conceptual framework that provides valuable information about unique intramolecular interactions and, therefore, can assist in the elucidation of biomolecular conformations. It is based on the idea that compensatory mutations at specific residue positions in a sequence help preserve stability of protein architecture and function and leave a statistical signature related to residue-residue interactions in the 3D structure of the protein. Consequently, statistical analysis of these correlated mutations in subsets of protein sequence alignments can be used to predict which residue pairs should be in spatial proximity in the native functional protein fold. These predicted signals can be then used to guide molecular dynamics (MD) simulations to predict the three-dimensional coordinates of a functional amino acid chain. In this chapter, we introduce a general and efficient methodology to perform coevolutionary analysis on protein sequences and to use this information in combination with computational physical models to predict the native 3D conformation of functional polypeptides. We present a step-by-step methodology that includes the description and application of software tools and databases required to infer tertiary structures of a protein fold. The general pipeline includes instructions on (1) how to obtain direct amino acid couplings from protein sequences using direct coupling analysis (DCA), (2) how to incorporate such signals as interaction potentials in Cα structure-based models (SBMs) to drive protein-folding MD simulations, (3) a procedure to estimate secondary structure and how to include such estimates in the topology files required in the MD simulations, and (4) how to build full atomic models based on the top Cα candidates selected in the pipeline. The information presented in this chapter is self-contained and sufficient to allow a computational scientist to predict structures of proteins using publicly available algorithms and databases.
Collapse
|
56
|
Abstract
Protein assemblies consisting of structural maintenance of chromosomes (SMC) and kleisin subunits are essential for the process of chromosome segregation across all domains of life. Prokaryotic condensin belonging to this class of protein complexes is composed of a homodimer of SMC that associates with a kleisin protein subunit called ScpA. While limited structural data exist for the proteins that comprise the (SMC)-kleisin complex, the complete structure of the entire complex remains unknown. Using an integrative approach combining both crystallographic data and coevolutionary information, we predict an atomic-scale structure of the whole condensin complex, which our results indicate being composed of a single ring. Coupling coevolutionary information with molecular-dynamics simulations, we study the interaction surfaces between the subunits and examine the plausibility of alternative stoichiometries of the complex. Our analysis also reveals several additional configurational states of the condensin hinge domain and the SMC-kleisin interaction domains, which are likely involved with the functional opening and closing of the condensin ring. This study provides the foundation for future investigations of the structure-function relationship of the various SMC-kleisin protein complexes at atomic resolution.
Collapse
|
57
|
Bitbol AF. Inferring interaction partners from protein sequences using mutual information. PLoS Comput Biol 2018; 14:e1006401. [PMID: 30422978 PMCID: PMC6258550 DOI: 10.1371/journal.pcbi.1006401] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Revised: 11/27/2018] [Accepted: 10/27/2018] [Indexed: 11/30/2022] Open
Abstract
Functional protein-protein interactions are crucial in most cellular processes. They enable multi-protein complexes to assemble and to remain stable, and they allow signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interacting partners, and thus in correlations between their sequences. Pairwise maximum-entropy based models have enabled successful inference of pairs of amino-acid residues that are in contact in the three-dimensional structure of multi-protein complexes, starting from the correlations in the sequence data of known interaction partners. Recently, algorithms inspired by these methods have been developed to identify which proteins are functional interaction partners among the paralogous proteins of two families, starting from sequence data alone. Here, we demonstrate that a slightly higher performance for partner identification can be reached by an approximate maximization of the mutual information between the sequence alignments of the two protein families. Our mutual information-based method also provides signatures of the existence of interactions between protein families. These results stand in contrast with structure prediction of proteins and of multi-protein complexes from sequence data, where pairwise maximum-entropy based global statistical models substantially improve performance compared to mutual information. Our findings entail that the statistical dependences allowing interaction partner prediction from sequence data are not restricted to the residue pairs that are in direct contact at the interface between the partner proteins.
Collapse
Affiliation(s)
- Anne-Florence Bitbol
- Sorbonne Université, CNRS, Laboratoire Jean Perrin (UMR 8237), F-75005 Paris, France
| |
Collapse
|
58
|
Cheng RR, Haglund E, Tiee NS, Morcos F, Levine H, Adams JA, Jennings PA, Onuchic JN. Designing bacterial signaling interactions with coevolutionary landscapes. PLoS One 2018; 13:e0201734. [PMID: 30125296 PMCID: PMC6101370 DOI: 10.1371/journal.pone.0201734] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2018] [Accepted: 07/21/2018] [Indexed: 11/19/2022] Open
Abstract
Selecting amino acids to design novel protein-protein interactions that facilitate catalysis is a daunting challenge. We propose that a computational coevolutionary landscape based on sequence analysis alone offers a major advantage over expensive, time-consuming brute-force approaches currently employed. Our coevolutionary landscape allows prediction of single amino acid substitutions that produce functional interactions between non-cognate, interspecies signaling partners. In addition, it can also predict mutations that maintain segregation of signaling pathways across species. Specifically, predictions of phosphotransfer activity between the Escherichia coli histidine kinase EnvZ to the non-cognate receiver Spo0F from Bacillus subtilis were compiled. Twelve mutations designed to enhance, suppress, or have a neutral effect on kinase phosphotransfer activity to a non-cognate partner were selected. We experimentally tested the ability of the kinase to relay phosphate to the respective designed Spo0F receiver proteins against the theoretical predictions. Our key finding is that the coevolutionary landscape theory, with limited structural data, can significantly reduce the search-space for successful prediction of single amino acid substitutions that modulate phosphotransfer between the two-component His-Asp relay partners in a predicted fashion. This combined approach offers significant improvements over large-scale mutations studies currently used for protein engineering and design.
Collapse
Affiliation(s)
- Ryan R. Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- * E-mail: (RRC); (JNO)
| | - Ellinor Haglund
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
| | - Nicholas S. Tiee
- Department of Chemistry & Biochemistry, The University of California, San Diego, California, United States of America
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Dallas, Texas, United States of America
- Department of Bioengineering, University of Texas at Dallas, Dallas, Texas, United States of America
| | - Herbert Levine
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Department of Bioengineering, Rice University, Houston, Texas, United States of America
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Physics & Astronomy, Rice University, Houston, Texas, United States of America
| | - Joseph A. Adams
- Department of Pharmacology, The University of California, San Diego, California, United States of America
| | - Patricia A. Jennings
- Department of Chemistry & Biochemistry, The University of California, San Diego, California, United States of America
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, United States of America
- Department of Biosciences, Rice University, Houston, Texas, United States of America
- Department of Physics & Astronomy, Rice University, Houston, Texas, United States of America
- Department of Chemistry, Rice University, Houston, Texas, United States of America
- * E-mail: (RRC); (JNO)
| |
Collapse
|
59
|
Szurmant H, Weigt M. Inter-residue, inter-protein and inter-family coevolution: bridging the scales. Curr Opin Struct Biol 2018; 50:26-32. [PMID: 29101847 PMCID: PMC5940578 DOI: 10.1016/j.sbi.2017.10.014] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/05/2017] [Revised: 10/12/2017] [Accepted: 10/13/2017] [Indexed: 10/18/2022]
Abstract
Interacting proteins coevolve at multiple but interconnected scales, from the residue-residue over the protein-protein up to the family-family level. The recent accumulation of enormous amounts of sequence data allows for the development of novel, data-driven computational approaches. Notably, these approaches can bridge scales within a single statistical framework. Although being currently applied mostly to isolated problems on single scales, their immense potential for an evolutionary informed, structural systems biology is steadily emerging.
Collapse
Affiliation(s)
- Hendrik Szurmant
- Department of Basic Medical Sciences, College of Osteopathic Medicine of the Pacific, Western University of Health Sciences, Pomona, CA 91766, USA.
| | - Martin Weigt
- Sorbonne Universités, UPMC Université Paris 06, CNRS, Biologie Computationnelle et Quantitative - Institut de Biologie Paris Seine, 75005 Paris, France.
| |
Collapse
|
60
|
dos Santos RN, Khan S, Morcos F. Characterization of C-ring component assembly in flagellar motors from amino acid coevolution. ROYAL SOCIETY OPEN SCIENCE 2018; 5:171854. [PMID: 29892378 PMCID: PMC5990795 DOI: 10.1098/rsos.171854] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/11/2017] [Accepted: 04/05/2018] [Indexed: 06/08/2023]
Abstract
Bacterial flagellar motility, an important virulence factor, is energized by a rotary motor localized within the flagellar basal body. The rotor module consists of a large framework (the C-ring), composed of the FliG, FliM and FliN proteins. FliN and FliM contacts the FliG torque ring to control the direction of flagellar rotation. We report that structure-based models constrained only by residue coevolution can recover the binding interface of atomic X-ray dimer complexes with remarkable accuracy (approx. 1 Å RMSD). We propose a model for FliM-FliN heterodimerization, which agrees accurately with homologous interfaces as well as in situ cross-linking experiments, and hence supports a proposed architecture for the lower portion of the C-ring. Furthermore, this approach allowed the identification of two discrete and interchangeable homodimerization interfaces between FliM middle domains that agree with experimental measurements and might be associated with C-ring directional switching dynamics triggered upon binding of CheY signal protein. Our findings provide structural details of complex formation at the C-ring that have been difficult to obtain with previous methodologies and clarify the architectural principle that underpins the ultra-sensitive allostery exhibited by this ring assembly that controls the clockwise or counterclockwise rotation of flagella.
Collapse
Affiliation(s)
- Ricardo Nascimento dos Santos
- Institute of Chemistry and Center for Computational Engineering and Science, University of Campinas, Campinas, SP, Brazil
| | - Shahid Khan
- Molecular Biology Consortium, Lawrence Berkeley National Laboratory, Berkeley, CA, USA
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
- Department of Bioengineering, University of Texas at Dallas, Richardson, TX, USA
- Center for Systems Biology, University of Texas at Dallas, Richardson, TX, USA
| |
Collapse
|
61
|
Nicoludis JM, Gaudet R. Applications of sequence coevolution in membrane protein biochemistry. BIOCHIMICA ET BIOPHYSICA ACTA. BIOMEMBRANES 2018; 1860:895-908. [PMID: 28993150 PMCID: PMC5807202 DOI: 10.1016/j.bbamem.2017.10.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Revised: 09/28/2017] [Accepted: 10/02/2017] [Indexed: 12/22/2022]
Abstract
Recently, protein sequence coevolution analysis has matured into a predictive powerhouse for protein structure and function. Direct methods, which use global statistical models of sequence coevolution, have enabled the prediction of membrane and disordered protein structures, protein complex architectures, and the functional effects of mutations in proteins. The field of membrane protein biochemistry and structural biology has embraced these computational techniques, which provide functional and structural information in an otherwise experimentally-challenging field. Here we review recent applications of protein sequence coevolution analysis to membrane protein structure and function and highlight the promising directions and future obstacles in these fields. We provide insights and guidelines for membrane protein biochemists who wish to apply sequence coevolution analysis to a given experimental system.
Collapse
Affiliation(s)
- John M Nicoludis
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138, United States
| | - Rachelle Gaudet
- Department of Molecular and Cellular Biology, Harvard University, Cambridge, MA, 02138, United States.
| |
Collapse
|
62
|
Cocco S, Feinauer C, Figliuzzi M, Monasson R, Weigt M. Inverse statistical physics of protein sequences: a key issues review. REPORTS ON PROGRESS IN PHYSICS. PHYSICAL SOCIETY (GREAT BRITAIN) 2018; 81:032601. [PMID: 29120346 DOI: 10.1088/1361-6633/aa9965] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.
Collapse
Affiliation(s)
- Simona Cocco
- Laboratoire de Physique Statistique de l'Ecole Normale Supérieure-UMR 8549, CNRS and PSL Research, Sorbonne Universités UPMC, Paris, France
| | | | | | | | | |
Collapse
|
63
|
Huang YJ, Brock KP, Sander C, Marks DS, Montelione GT. A Hybrid Approach for Protein Structure Determination Combining Sparse NMR with Evolutionary Coupling Sequence Data. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2018; 1105:153-169. [PMID: 30617828 DOI: 10.1007/978-981-13-2200-6_10] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
Abstract
While 3D structure determination of small (<15 kDa) proteins by solution NMR is largely automated and routine, structural analysis of larger proteins is more challenging. An emerging hybrid strategy for modeling protein structures combines sparse NMR data that can be obtained for larger proteins with sequence co-variation data, called evolutionary couplings (ECs), obtained from multiple sequence alignments of protein families. This hybrid "EC-NMR" method can be used to accurately model larger (15-60 kDa) proteins, and more rapidly determine structures of smaller (5-15 kDa) proteins using only backbone NMR data. The resulting structures have accuracies relative to reference structures comparable to those obtained with full backbone and sidechain NMR resonance assignments. The requirement that evolutionary couplings (ECs) are consistent with NMR data recorded on a specific member of a protein family, under specific conditions, potentially also allows identification of ECs that reflect alternative allosteric or excited states of the protein structure.
Collapse
Affiliation(s)
- Yuanpeng Janet Huang
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA
| | - Kelly P Brock
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Chris Sander
- Department of Cell Biology, Harvard Medical School, Boston, MA, USA
- cBio Center, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Debora S Marks
- Department of Systems Biology, Harvard Medical School, Boston, MA, USA
| | - Gaetano T Montelione
- Center for Advanced Biotechnology and Medicine, Department of Molecular Biology and Biochemistry, Rutgers, The State University of New Jersey, Piscataway, NJ, USA.
| |
Collapse
|
64
|
Carapia-Minero N, Castelán-Vega JA, Pérez NO, Rodríguez-Tovar AV. The phosphorelay signal transduction system in Candida glabrata: an in silico analysis. J Mol Model 2017; 24:13. [PMID: 29248994 DOI: 10.1007/s00894-017-3545-z] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2017] [Accepted: 11/24/2017] [Indexed: 01/18/2023]
Abstract
Signaling systems allow microorganisms to sense and respond to different stimuli through the modification of gene expression. The phosphorelay signal transduction system in eukaryotes involves three proteins: a sensor protein, an intermediate protein and a response regulator, and requires the transfer of a phosphate group between two histidine-aspartic residues. The SLN1-YPD1-SSK1 system enables yeast to adapt to hyperosmotic stress through the activation of the HOG1-MAPK pathway. The genetic sequences available from Saccharomyces cerevisiae were used to identify orthologous sequences in Candida glabrata, and putative genes were identified and characterized by in silico assays. An interactome analysis was carried out with the complete genome of C. glabrata and the putative proteins of the phosphorelay signal transduction system. Next, we modeled the complex formed between the sensor protein CgSln1p and the intermediate CgYpd1p. Finally, phosphate transfer was examined by a molecular dynamic assay. Our in silico analysis showed that the putative proteins of the C. glabrata phosphorelay signal transduction system present the functional domains of histidine kinase, a downstream response regulator protein, and an intermediate histidine phosphotransfer protein. All the sequences are phylogenetically more related to S. cerevisiae than to C. albicans. The interactome suggests that the C. glabrata phosphorelay signal transduction system interacts with different proteins that regulate cell wall biosynthesis and responds to oxidative and osmotic stress the same way as similar systems in S. cerevisiae and C. albicans. Molecular dynamics simulations showed complex formation between the response regulator domain of histidine kinase CgSln1 and intermediate protein CgYpd1 in the presence of a phosphate group and interactions between the aspartic residue and the histidine residue. Overall, our research showed that C. glabrata harbors a functional SLN1-YPD1-SSK1 phosphorelay system.
Collapse
Affiliation(s)
- Natalee Carapia-Minero
- Laboratorio de Micología Médica, Depto. de Microbiología, Escuela Nacional de Ciencias Biológicas (ENCB) , Instituto Politécnico Nacional, Prolongación de Carpio y Plan de Ayala s/n, Col. Casco de Santo Tomás, Del. Miguel Hidalgo, CP 11340, Ciudad de México, Mexico
| | - Juan Arturo Castelán-Vega
- Laboratorio de Producción y Control de Biológicos ENCB, Instituto Politécnico Nacional, Carpio y Plan de Ayala s/n, Col. Casco de Santo Tomás, Del. Miguel Hidalgo, CP 11340, Ciudad de México, Mexico
| | - Néstor Octavio Pérez
- Unidad de investigación y Desarrollo, Probiomed, SA de CV, Cruce de Carreteras Acatzingo-Zumpahuacan S/N, CP 52400, Tenancingo, Edo de México, Mexico.
| | - Aída Verónica Rodríguez-Tovar
- Laboratorio de Micología Médica, Depto. de Microbiología, Escuela Nacional de Ciencias Biológicas (ENCB) , Instituto Politécnico Nacional, Prolongación de Carpio y Plan de Ayala s/n, Col. Casco de Santo Tomás, Del. Miguel Hidalgo, CP 11340, Ciudad de México, Mexico.
| |
Collapse
|
65
|
Biomolecular coevolution and its applications: Going from structure prediction toward signaling, epistasis, and function. Biochem Soc Trans 2017; 45:1253-1261. [DOI: 10.1042/bst20170063] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/23/2017] [Revised: 08/30/2017] [Accepted: 09/04/2017] [Indexed: 01/01/2023]
Abstract
Evolution leads to considerable changes in the sequence of biomolecules, while their overall structure and function remain quite conserved. The wealth of genomic sequences, the ‘Biological Big Data’, modern sequencing techniques provide allows us to investigate biomolecular evolution with unprecedented detail. Sophisticated statistical models can infer residue pair mutations resulting from spatial proximity. The introduction of predicted spatial adjacencies as constraints in biomolecular structure prediction workflows has transformed the field of protein and RNA structure prediction toward accuracies approaching the experimental resolution limit. Going beyond structure prediction, the same mathematical framework allows mimicking evolutionary fitness landscapes to infer signaling interactions, epistasis, or mutational landscapes.
Collapse
|
66
|
Abstract
Cyclic diguanylate (c-di-GMP) is a near universal signaling molecule produced by diguanylate cyclases that can direct a variety of bacterial behaviors. A major area of research over the last several years has been aimed at understanding how a cell with dozens of diguanylate cyclases can deploy a given subset of them to produce a desired phenotypic outcome without undesired cross talk between c-di-GMP-dependent systems. Several models have been put forward to address this question, including specificity of cyclase activation, tuned binding constants of effector proteins, and physical interaction between cyclases and effectors. Additionally, recent evidence has suggested that there may be a link between the catalytic state of a cyclase and its physical contact with an effector. This review highlights several key studies, examines the proposed global and local models of c-di-GMP signaling specificity in bacteria, and attempts to identify the most fruitful steps that can be taken to better understand how dynamic networks of sibling cyclases and effector proteins result in sensible outputs that govern cellular behavior.
Collapse
Affiliation(s)
- Kurt M Dahlstrom
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire 03755;
| | - George A O'Toole
- Department of Microbiology and Immunology, Geisel School of Medicine at Dartmouth, Hanover, New Hampshire 03755;
| |
Collapse
|
67
|
Inferring repeat-protein energetics from evolutionary information. PLoS Comput Biol 2017; 13:e1005584. [PMID: 28617812 PMCID: PMC5491312 DOI: 10.1371/journal.pcbi.1005584] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Revised: 06/29/2017] [Accepted: 05/21/2017] [Indexed: 11/19/2022] Open
Abstract
Natural protein sequences contain a record of their history. A common constraint in a given protein family is the ability to fold to specific structures, and it has been shown possible to infer the main native ensemble by analyzing covariations in extant sequences. Still, many natural proteins that fold into the same structural topology show different stabilization energies, and these are often related to their physiological behavior. We propose a description for the energetic variation given by sequence modifications in repeat proteins, systems for which the overall problem is simplified by their inherent symmetry. We explicitly account for single amino acid and pair-wise interactions and treat higher order correlations with a single term. We show that the resulting evolutionary field can be interpreted with structural detail. We trace the variations in the energetic scores of natural proteins and relate them to their experimental characterization. The resulting energetic evolutionary field allows the prediction of the folding free energy change for several mutants, and can be used to generate synthetic sequences that are statistically indistinguishable from the natural counterparts.
Collapse
|
68
|
Conservation of coevolving protein interfaces bridges prokaryote-eukaryote homologies in the twilight zone. Proc Natl Acad Sci U S A 2016; 113:15018-15023. [PMID: 27965389 DOI: 10.1073/pnas.1611861114] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Protein-protein interactions are fundamental for the proper functioning of the cell. As a result, protein interaction surfaces are subject to strong evolutionary constraints. Recent developments have shown that residue coevolution provides accurate predictions of heterodimeric protein interfaces from sequence information. So far these approaches have been limited to the analysis of families of prokaryotic complexes for which large multiple sequence alignments of homologous sequences can be compiled. We explore the hypothesis that coevolution points to structurally conserved contacts at protein-protein interfaces, which can be reliably projected to homologous complexes with distantly related sequences. We introduce a domain-centered protocol to study the interplay between residue coevolution and structural conservation of protein-protein interfaces. We show that sequence-based coevolutionary analysis systematically identifies residue contacts at prokaryotic interfaces that are structurally conserved at the interface of their eukaryotic counterparts. In turn, this allows the prediction of conserved contacts at eukaryotic protein-protein interfaces with high confidence using solely mutational patterns extracted from prokaryotic genomes. Even in the context of high divergence in sequence (the twilight zone), where standard homology modeling of protein complexes is unreliable, our approach provides sequence-based accurate information about specific details of protein interactions at the residue level. Selected examples of the application of prokaryotic coevolutionary analysis to the prediction of eukaryotic interfaces further illustrate the potential of this approach.
Collapse
|
69
|
Bai F, Morcos F, Cheng RR, Jiang H, Onuchic JN. Elucidating the druggable interface of protein-protein interactions using fragment docking and coevolutionary analysis. Proc Natl Acad Sci U S A 2016; 113:E8051-E8058. [PMID: 27911825 PMCID: PMC5167203 DOI: 10.1073/pnas.1615932113] [Citation(s) in RCA: 57] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/28/2022] Open
Abstract
Protein-protein interactions play a central role in cellular function. Improving the understanding of complex formation has many practical applications, including the rational design of new therapeutic agents and the mechanisms governing signal transduction networks. The generally large, flat, and relatively featureless binding sites of protein complexes pose many challenges for drug design. Fragment docking and direct coupling analysis are used in an integrated computational method to estimate druggable protein-protein interfaces. (i) This method explores the binding of fragment-sized molecular probes on the protein surface using a molecular docking-based screen. (ii) The energetically favorable binding sites of the probes, called hot spots, are spatially clustered to map out candidate binding sites on the protein surface. (iii) A coevolution-based interface interaction score is used to discriminate between different candidate binding sites, yielding potential interfacial targets for therapeutic drug design. This approach is validated for important, well-studied disease-related proteins with known pharmaceutical targets, and also identifies targets that have yet to be studied. Moreover, therapeutic agents are proposed by chemically connecting the fragments that are strongly bound to the hot spots.
Collapse
Affiliation(s)
- Fang Bai
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Dallas, TX 75080
- Department of Bioengineering, University of Texas at Dallas, Dallas, TX 75080
- Center for Systems Biology, University of Texas at Dallas, Dallas, TX 75080
| | - Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005
| | - Hualiang Jiang
- Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, Shanghai 201203, China;
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005;
- Department of Physics and Astronomy, Rice University, Houston, TX 77005
- Department of Chemistry, Rice University, Houston, TX 77005
- Department of Biosciences, Rice University, Houston, TX 77005
| |
Collapse
|
70
|
Simultaneous identification of specifically interacting paralogs and interprotein contacts by direct coupling analysis. Proc Natl Acad Sci U S A 2016; 113:12186-12191. [PMID: 27729520 DOI: 10.1073/pnas.1607570113] [Citation(s) in RCA: 69] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Understanding protein-protein interactions is central to our understanding of almost all complex biological processes. Computational tools exploiting rapidly growing genomic databases to characterize protein-protein interactions are urgently needed. Such methods should connect multiple scales from evolutionary conserved interactions between families of homologous proteins, over the identification of specifically interacting proteins in the case of multiple paralogs inside a species, down to the prediction of residues being in physical contact across interaction interfaces. Statistical inference methods detecting residue-residue coevolution have recently triggered considerable progress in using sequence data for quaternary protein structure prediction; they require, however, large joint alignments of homologous protein pairs known to interact. The generation of such alignments is a complex computational task on its own; application of coevolutionary modeling has, in turn, been restricted to proteins without paralogs, or to bacterial systems with the corresponding coding genes being colocalized in operons. Here we show that the direct coupling analysis of residue coevolution can be extended to connect the different scales, and simultaneously to match interacting paralogs, to identify interprotein residue-residue contacts and to discriminate interacting from noninteracting families in a multiprotein system. Our results extend the potential applications of coevolutionary analysis far beyond cases treatable so far.
Collapse
|
71
|
Abstract
Specific protein-protein interactions are crucial in the cell, both to ensure the formation and stability of multiprotein complexes and to enable signal transduction in various pathways. Functional interactions between proteins result in coevolution between the interaction partners, causing their sequences to be correlated. Here we exploit these correlations to accurately identify, from sequence data alone, which proteins are specific interaction partners. Our general approach, which employs a pairwise maximum entropy model to infer couplings between residues, has been successfully used to predict the 3D structures of proteins from sequences. Thus inspired, we introduce an iterative algorithm to predict specific interaction partners from two protein families whose members are known to interact. We first assess the algorithm's performance on histidine kinases and response regulators from bacterial two-component signaling systems. We obtain a striking 0.93 true positive fraction on our complete dataset without any a priori knowledge of interaction partners, and we uncover the origin of this success. We then apply the algorithm to proteins from ATP-binding cassette (ABC) transporter complexes, and obtain accurate predictions in these systems as well. Finally, we present two metrics that accurately distinguish interacting protein families from noninteracting ones, using only sequence data.
Collapse
|
72
|
Cheng RR, Nordesjö O, Hayes RL, Levine H, Flores SC, Onuchic JN, Morcos F. Connecting the Sequence-Space of Bacterial Signaling Proteins to Phenotypes Using Coevolutionary Landscapes. Mol Biol Evol 2016; 33:3054-3064. [PMID: 27604223 PMCID: PMC5100047 DOI: 10.1093/molbev/msw188] [Citation(s) in RCA: 48] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Two-component signaling (TCS) is the primary means by which bacteria sense and respond to the environment. TCS involves two partner proteins working in tandem, which interact to perform cellular functions whereas limiting interactions with non-partners (i.e., cross-talk). We construct a Potts model for TCS that can quantitatively predict how mutating amino acid identities affect the interaction between TCS partners and non-partners. The parameters of this model are inferred directly from protein sequence data. This approach drastically reduces the computational complexity of exploring the sequence-space of TCS proteins. As a stringent test, we compare its predictions to a recent comprehensive mutational study, which characterized the functionality of 204 mutational variants of the PhoQ kinase in Escherichia coli We find that our best predictions accurately reproduce the amino acid combinations found in experiment, which enable functional signaling with its partner PhoP. These predictions demonstrate the evolutionary pressure to preserve the interaction between TCS partners as well as prevent unwanted cross-talk. Further, we calculate the mutational change in the binding affinity between PhoQ and PhoP, providing an estimate to the amount of destabilization needed to disrupt TCS.
Collapse
Affiliation(s)
- R R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, TX
| | - O Nordesjö
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - R L Hayes
- Department of Biophysics, University of Michigan, Ann Arbor, MI
| | - H Levine
- Center for Theoretical Biological Physics, Rice University, Houston, TX.,Department of Bioengineering, Rice University, Houston, TX
| | - S C Flores
- Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden
| | - J N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX .,Department of Physics and Astronomy, Rice University, Houston, TX.,Department of Chemistry, and Biosciences, Rice University, Houston, TX
| | - F Morcos
- Department of Biological Sciences and Center for Systems Biology, University of Texas at Dallas, Dallas, TX
| |
Collapse
|
73
|
A Combined Computational and Genetic Approach Uncovers Network Interactions of the Cyanobacterial Circadian Clock. J Bacteriol 2016; 198:2439-47. [PMID: 27381914 DOI: 10.1128/jb.00235-16] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2016] [Accepted: 06/27/2016] [Indexed: 01/19/2023] Open
Abstract
UNLABELLED Two-component systems (TCS) that employ histidine kinases (HK) and response regulators (RR) are critical mediators of cellular signaling in bacteria. In the model cyanobacterium Synechococcus elongatus PCC 7942, TCSs control global rhythms of transcription that reflect an integration of time information from the circadian clock with a variety of cellular and environmental inputs. The HK CikA and the SasA/RpaA TCS transduce time information from the circadian oscillator to modulate downstream cellular processes. Despite immense progress in understanding of the circadian clock itself, many of the connections between the clock and other cellular signaling systems have remained enigmatic. To narrow the search for additional TCS components that connect to the clock, we utilized direct-coupling analysis (DCA), a statistical analysis of covariant residues among related amino acid sequences, to infer coevolution of new and known clock TCS components. DCA revealed a high degree of interaction specificity between SasA and CikA with RpaA, as expected, but also with the phosphate-responsive response regulator SphR. Coevolutionary analysis also predicted strong specificity between RpaA and a previously undescribed kinase, HK0480 (herein CikB). A knockout of the gene for CikB (cikB) in a sasA cikA null background eliminated the RpaA phosphorylation and RpaA-controlled transcription that is otherwise present in that background and suppressed cell elongation, supporting the notion that CikB is an interactor with RpaA and the clock network. This study demonstrates the power of DCA to identify subnetworks and key interactions in signaling pathways and of combinatorial mutagenesis to explore the phenotypic consequences. Such a combined strategy is broadly applicable to other prokaryotic systems. IMPORTANCE Signaling networks are complex and extensive, comprising multiple integrated pathways that respond to cellular and environmental cues. A TCS interaction model, based on DCA, independently confirmed known interactions and revealed a core set of subnetworks within the larger HK-RR set. We validated high-scoring candidate proteins via combinatorial genetics, demonstrating that DCA can be utilized to reduce the search space of complex protein networks and to infer undiscovered specific interactions for signaling proteins in vivo Significantly, new interactions that link circadian response to cell division and fitness in a light/dark cycle were uncovered. The combined analysis also uncovered a more basic core clock, illustrating the synergy and applicability of a combined computational and genetic approach for investigating prokaryotic signaling networks.
Collapse
|
74
|
Abstract
Structural domains are believed to be modules within proteins that can fold and function independently. Some proteins show tandem repetitions of apparent modular structure that do not fold independently, but rather co-operate in stabilizing structural forms that comprise several repeat-units. For many natural repeat-proteins, it has been shown that weak energetic links between repeats lead to the breakdown of co-operativity and the appearance of folding sub-domains within an apparently regular repeat array. The quasi-1D architecture of repeat-proteins is crucial in detailing how the local energetic balances can modulate the folding dynamics of these proteins, which can be related to the physiological behaviour of these ubiquitous biological systems.
Collapse
|
75
|
Zhang B, Wolynes PG. Shape Transitions and Chiral Symmetry Breaking in the Energy Landscape of the Mitotic Chromosome. PHYSICAL REVIEW LETTERS 2016; 116:248101. [PMID: 27367409 DOI: 10.1103/physrevlett.116.248101] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/10/2015] [Indexed: 05/18/2023]
Abstract
We derive an unbiased information theoretic energy landscape for chromosomes at metaphase using a maximum entropy approach that accurately reproduces the details of the experimentally measured pairwise contact probabilities between genomic loci. Dynamical simulations using this landscape lead to cylindrical, helically twisted structures reflecting liquid crystalline order. These structures are similar to those arising from a generic ideal homogenized chromosome energy landscape. The helical twist can be either right or left handed so chiral symmetry is broken spontaneously. The ideal chromosome landscape when augmented by interactions like those leading to topologically associating domain formation in the interphase chromosome reproduces these behaviors. The phase diagram of this landscape shows that the helical fiber order and the cylindrical shape persist at temperatures above the onset of chiral symmetry breaking, which is limited by the topologically associating domain interaction strength.
Collapse
Affiliation(s)
- Bin Zhang
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
| | - Peter G Wolynes
- Department of Chemistry, Rice University, Houston, Texas 77005, USA
- Center for Theoretical Biological Physics, Rice University, Houston, Texas 77005, USA
- Department of Physics and Astronomy, Rice University, Houston, Texas 77005, USA
| |
Collapse
|
76
|
Neuwald AF. Gleaning structural and functional information from correlations in protein multiple sequence alignments. Curr Opin Struct Biol 2016; 38:1-8. [PMID: 27179293 DOI: 10.1016/j.sbi.2016.04.006] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2015] [Revised: 04/28/2016] [Accepted: 04/29/2016] [Indexed: 10/24/2022]
Abstract
The availability of vast amounts of protein sequence data facilitates detection of subtle statistical correlations due to imposed structural and functional constraints. Recent breakthroughs using Direct Coupling Analysis (DCA) and related approaches have tapped into correlations believed to be due to compensatory mutations. This has yielded some remarkable results, including substantially improved prediction of protein intra- and inter-domain 3D contacts, of membrane and globular protein structures, of substrate binding sites, and of protein conformational heterogeneity. A complementary approach is Bayesian Partitioning with Pattern Selection (BPPS), which partitions related proteins into hierarchically-arranged subgroups based on correlated residue patterns. These correlated patterns are presumably due to structural and functional constraints associated with evolutionary divergence rather than to compensatory mutations. Hence joint application of DCA- and BPPS-based approaches should help sort out the structural and functional constraints contributing to sequence correlations.
Collapse
Affiliation(s)
- Andrew F Neuwald
- Institute for Genome Sciences and Department of Biochemistry & Molecular Biology, University of Maryland School of Medicine, 801 West Baltimore St., BioPark II, Room 617, Baltimore, MD 21201, United States.
| |
Collapse
|
77
|
SMOG 2: A Versatile Software Package for Generating Structure-Based Models. PLoS Comput Biol 2016; 12:e1004794. [PMID: 26963394 PMCID: PMC4786265 DOI: 10.1371/journal.pcbi.1004794] [Citation(s) in RCA: 191] [Impact Index Per Article: 23.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2015] [Accepted: 02/07/2016] [Indexed: 12/01/2022] Open
Abstract
Molecular dynamics simulations with coarse-grained or simplified Hamiltonians have proven to be an effective means of capturing the functionally important long-time and large-length scale motions of proteins and RNAs. Originally developed in the context of protein folding, structure-based models (SBMs) have since been extended to probe a diverse range of biomolecular processes, spanning from protein and RNA folding to functional transitions in molecular machines. The hallmark feature of a structure-based model is that part, or all, of the potential energy function is defined by a known structure. Within this general class of models, there exist many possible variations in resolution and energetic composition. SMOG 2 is a downloadable software package that reads user-designated structural information and user-defined energy definitions, in order to produce the files necessary to use SBMs with high performance molecular dynamics packages: GROMACS and NAMD. SMOG 2 is bundled with XML-formatted template files that define commonly used SBMs, and it can process template files that are altered according to the needs of each user. This computational infrastructure also allows for experimental or bioinformatics-derived restraints or novel structural features to be included, e.g. novel ligands, prosthetic groups and post-translational/transcriptional modifications. The code and user guide can be downloaded at http://smog-server.org/smog2.
Collapse
|
78
|
Feinauer C, Szurmant H, Weigt M, Pagnani A. Inter-Protein Sequence Co-Evolution Predicts Known Physical Interactions in Bacterial Ribosomes and the Trp Operon. PLoS One 2016; 11:e0149166. [PMID: 26882169 PMCID: PMC4755613 DOI: 10.1371/journal.pone.0149166] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2015] [Accepted: 01/28/2016] [Indexed: 11/29/2022] Open
Abstract
Interaction between proteins is a fundamental mechanism that underlies virtually all biological processes. Many important interactions are conserved across a large variety of species. The need to maintain interaction leads to a high degree of co-evolution between residues in the interface between partner proteins. The inference of protein-protein interaction networks from the rapidly growing sequence databases is one of the most formidable tasks in systems biology today. We propose here a novel approach based on the Direct-Coupling Analysis of the co-evolution between inter-protein residue pairs. We use ribosomal and trp operon proteins as test cases: For the small resp. large ribosomal subunit our approach predicts protein-interaction partners at a true-positive rate of 70% resp. 90% within the first 10 predictions, with areas of 0.69 resp. 0.81 under the ROC curves for all predictions. In the trp operon, it assigns the two largest interaction scores to the only two interactions experimentally known. On the level of residue interactions we show that for both the small and the large ribosomal subunit our approach predicts interacting residues in the system with a true positive rate of 60% and 85% in the first 20 predictions. We use artificial data to show that the performance of our approach depends crucially on the size of the joint multiple sequence alignments and analyze how many sequences would be necessary for a perfect prediction if the sequences were sampled from the same model that we use for prediction. Given the performance of our approach on the test data we speculate that it can be used to detect new interactions, especially in the light of the rapid growth of available sequence data.
Collapse
Affiliation(s)
- Christoph Feinauer
- Department of Applied Science and Technology, and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
| | - Hendrik Szurmant
- Department of Molecular and Experimental Medicine, The Scripps Research Institute, La Jolla, CA, United States of America
| | - Martin Weigt
- Sorbonne Universités, UPMC, UMR 7238, Computational and Quantitative Biology, Paris, France
- CNRS, UMR 7238, Computational and Quantitative Biology, Paris, France
- * E-mail: (MW); (AP)
| | - Andrea Pagnani
- Department of Applied Science and Technology, and Center for Computational Sciences, Politecnico di Torino, Torino, Italy
- Human Genetics Foundation, Molecular Biotechnology Center (MBC), Torino, Italy
- * E-mail: (MW); (AP)
| |
Collapse
|
79
|
Noel JK, Morcos F, Onuchic JN. Sequence co-evolutionary information is a natural partner to minimally-frustrated models of biomolecular dynamics. F1000Res 2016; 5. [PMID: 26918164 PMCID: PMC4755392 DOI: 10.12688/f1000research.7186.1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/21/2016] [Indexed: 11/25/2022] Open
Abstract
Experimentally derived structural constraints have been crucial to the implementation of computational models of biomolecular dynamics. For example, not only does crystallography provide essential starting points for molecular simulations but also high-resolution structures permit for parameterization of simplified models. Since the energy landscapes for proteins and other biomolecules have been shown to be minimally frustrated and therefore funneled, these structure-based models have played a major role in understanding the mechanisms governing folding and many functions of these systems. Structural information, however, may be limited in many interesting cases. Recently, the statistical analysis of residue co-evolution in families of protein sequences has provided a complementary method of discovering residue-residue contact interactions involved in functional configurations. These functional configurations are often transient and difficult to capture experimentally. Thus, co-evolutionary information can be merged with that available for experimentally characterized low free-energy structures, in order to more fully capture the true underlying biomolecular energy landscape.
Collapse
Affiliation(s)
- Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA; Kristallographie, Max-Delbrück-Centrum für Molekulare Medizin, Berlin, Germany
| | - Faruck Morcos
- Department of Biological Sciences, University of Texas at Dallas, Richardson, TX, USA
| | - Jose N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX, USA
| |
Collapse
|
80
|
Echave J, Spielman SJ, Wilke CO. Causes of evolutionary rate variation among protein sites. Nat Rev Genet 2016; 17:109-21. [PMID: 26781812 DOI: 10.1038/nrg.2015.18] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
It has long been recognized that certain sites within a protein, such as sites in the protein core or catalytic residues in enzymes, are evolutionarily more conserved than other sites. However, our understanding of rate variation among sites remains surprisingly limited. Recent progress to address this includes the development of a wide array of reliable methods to estimate site-specific substitution rates from sequence alignments. In addition, several molecular traits have been identified that correlate with site-specific mutation rates, and novel mechanistic biophysical models have been proposed to explain the observed correlations. Nonetheless, current models explain, at best, approximately 60% of the observed variance, highlighting the limitations of current methods and models and the need for new research directions.
Collapse
Affiliation(s)
- Julian Echave
- Escuela de Ciencia y Tecnología, Universidad Nacional de San Martín, 1650 San Martín, Buenos Aires, Argentina
| | - Stephanie J Spielman
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| | - Claus O Wilke
- Department of Integrative Biology, Center for Computational Biology and Bioinformatics, and Institute for Cellular and Molecular Biology, The University of Texas at Austin, Austin, Texas 78712, USA
| |
Collapse
|
81
|
Prabhakaran P, Ashraf MA, Aqma WS. Microbial stress response to heavy metals in the environment. RSC Adv 2016. [DOI: 10.1039/c6ra10966g] [Citation(s) in RCA: 87] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
Heavy metal contamination is a global environmental issue as it poses a significant threat to public health, and exposure to metals above a certain threshold level can cause deleterious effects in all living organisms including microbes.
Collapse
Affiliation(s)
- Pranesha Prabhakaran
- School of Biosciences and Biotechnology
- Faculty of Science and Technology
- Universiti Kebangsaan Malaysia
- 43600 Bangi
- Malaysia
| | - Muhammad Aqeel Ashraf
- Faculty of Science & Natural Resources
- Universiti Malaysia Sabah
- 88400 Kota Kinabalu
- Malaysia
- Department of Environmental Science and Engineering
| | - Wan Syaidatul Aqma
- School of Biosciences and Biotechnology
- Faculty of Science and Technology
- Universiti Kebangsaan Malaysia
- 43600 Bangi
- Malaysia
| |
Collapse
|
82
|
Cheng RR, Raghunathan M, Noel JK, Onuchic JN. Constructing sequence-dependent protein models using coevolutionary information. Protein Sci 2016; 25:111-22. [PMID: 26223372 PMCID: PMC4815312 DOI: 10.1002/pro.2758] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2015] [Accepted: 07/27/2015] [Indexed: 11/08/2022]
Abstract
Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid sites within the multiple sequence alignment of a protein family. Here, we use the maximum entropy-based approach called mean field Direct Coupling Analysis (mfDCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family. We use the inferred pairwise statistical couplings to generate the sequence-dependent heterogeneous interaction energies of a structure-based model (SBM) where only native contacts are considered. Considering the ribosomal S6 protein and its circular permutants as well as the SH3 protein, we demonstrate that these models quantitatively agree with experimental data on folding mechanisms. This work serves as a new framework for generating coevolutionary data-enriched models that can potentially be used to engineer key functional motions and novel interactions in protein systems.
Collapse
Affiliation(s)
- Ryan R Cheng
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
| | - Mohit Raghunathan
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
- Department of Physics & Astronomy, Rice University, Houston, Texas, 77005
| | - Jeffrey K Noel
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
- Department of Physics & Astronomy, Rice University, Houston, Texas, 77005
| | - José N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, Texas, 77005
- Department of Physics & Astronomy, Rice University, Houston, Texas, 77005
| |
Collapse
|
83
|
Mallik S, Das S, Kundu S. Predicting protein folding rate change upon point mutation using residue-level coevolutionary information. Proteins 2015; 84:3-8. [DOI: 10.1002/prot.24960] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Revised: 11/11/2015] [Accepted: 11/11/2015] [Indexed: 11/10/2022]
Affiliation(s)
- Saurav Mallik
- Department of Biophysics; Molecular Biology and Bioinformatics, University of Calcutta; Kolkata 700009 India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta; Kolkata 700009 India
| | - Smita Das
- Department of Biophysics; Molecular Biology and Bioinformatics, University of Calcutta; Kolkata 700009 India
| | - Sudip Kundu
- Department of Biophysics; Molecular Biology and Bioinformatics, University of Calcutta; Kolkata 700009 India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase-II), University of Calcutta; Kolkata 700009 India
| |
Collapse
|
84
|
From residue coevolution to protein conformational ensembles and functional dynamics. Proc Natl Acad Sci U S A 2015; 112:13567-72. [PMID: 26487681 DOI: 10.1073/pnas.1508584112] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
The analysis of evolutionary amino acid correlations has recently attracted a surge of renewed interest, also due to their successful use in de novo protein native structure prediction. However, many aspects of protein function, such as substrate binding and product release in enzymatic activity, can be fully understood only in terms of an equilibrium ensemble of alternative structures, rather than a single static structure. In this paper we combine coevolutionary data and molecular dynamics simulations to study protein conformational heterogeneity. To that end, we adapt the Boltzmann-learning algorithm to the analysis of homologous protein sequences and develop a coarse-grained protein model specifically tailored to convert the resulting contact predictions to a protein structural ensemble. By means of exhaustive sampling simulations, we analyze the set of conformations that are consistent with the observed residue correlations for a set of representative protein domains, showing that (i) the most representative structure is consistent with the experimental fold and (ii) the various regions of the sequence display different stability, related to multiple biologically relevant conformations and to the cooperativity of the coevolving pairs. Moreover, we show that the proposed protocol is able to reproduce the essential features of a protein folding mechanism as well as to account for regions involved in conformational transitions through the correct sampling of the involved conformers.
Collapse
|
85
|
Figliuzzi M, Jacquier H, Schug A, Tenaillon O, Weigt M. Coevolutionary Landscape Inference and the Context-Dependence of Mutations in Beta-Lactamase TEM-1. Mol Biol Evol 2015; 33:268-80. [PMID: 26446903 PMCID: PMC4693977 DOI: 10.1093/molbev/msv211] [Citation(s) in RCA: 167] [Impact Index Per Article: 18.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The quantitative characterization of mutational landscapes is a task of outstanding importance in evolutionary and medical biology: It is, for example, of central importance for our understanding of the phenotypic effect of mutations related to disease and antibiotic drug resistance. Here we develop a novel inference scheme for mutational landscapes, which is based on the statistical analysis of large alignments of homologs of the protein of interest. Our method is able to capture epistatic couplings between residues, and therefore to assess the dependence of mutational effects on the sequence context where they appear. Compared with recent large-scale mutagenesis data of the beta-lactamase TEM-1, a protein providing resistance against beta-lactam antibiotics, our method leads to an increase of about 40% in explicative power as compared with approaches neglecting epistasis. We find that the informative sequence context extends to residues at native distances of about 20 Å from the mutated site, reaching thus far beyond residues in direct physical contact.
Collapse
Affiliation(s)
- Matteo Figliuzzi
- UPMC, Institut de Calcul et de la Simulation, Sorbonne Universités, Paris, France Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| | - Hervé Jacquier
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France Service de Bactériologie-Virologie, Groupe Hospitalier Lariboisiére-Fernand Widal, Assistance Publique-Hôpitaux de Paris (AP-HP), Paris, France
| | - Alexander Schug
- Steinbuch Centre for Computing, Karlsruhe Institute for Technology, Eggenstein-Leopoldshafen, Germany
| | - Oliver Tenaillon
- Infection, Antimicrobials, Modelling, Evolution, INSERM, Université Denis Diderot Paris 7, UMR 1137, Sorbonne Paris Cité, Paris, France
| | - Martin Weigt
- Computational and Quantitative Biology, UPMC, UMR 7238, Sorbonne Universités, Paris, France Computational and Quantitative Biology, CNRS, UMR 7238, Paris, France
| |
Collapse
|
86
|
dos Santos RN, Morcos F, Jana B, Andricopulo AD, Onuchic JN. Dimeric interactions and complex formation using direct coevolutionary couplings. Sci Rep 2015; 5:13652. [PMID: 26338201 PMCID: PMC4559900 DOI: 10.1038/srep13652] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2015] [Accepted: 07/13/2015] [Indexed: 11/09/2022] Open
Abstract
We develop a procedure to characterize the association of protein structures into homodimers using coevolutionary couplings extracted from Direct Coupling Analysis (DCA) in combination with Structure Based Models (SBM). Identification of dimerization contacts using DCA is more challenging than intradomain contacts since direct couplings are mixed with monomeric contacts. Therefore a systematic way to extract dimerization signals has been elusive. We provide evidence that the prediction of homodimeric complexes is possible with high accuracy for all the cases we studied which have rich sequence information. For the most accurate conformations of the structurally diverse dimeric complexes studied the mean and interfacial RMSDs are 1.95Å and 1.44Å, respectively. This methodology is also able to identify distinct dimerization conformations as for the case of the family of response regulators, which dimerize upon activation. The identification of dimeric complexes can provide interesting molecular insights in the construction of large oligomeric complexes and be useful in the study of aggregation related diseases like Alzheimer's or Parkinson's.
Collapse
Affiliation(s)
- Ricardo N. dos Santos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| | - Biman Jana
- Department of Physical Chemistry, Indian Association for the Cultivation of Science, Jadavpur, Kolkata-700032, India
| | - Adriano D. Andricopulo
- Laboratório de Química Medicinal e Computacional, Instituto de Física de São Carlos, Universidade de São Paulo, São Paulo, São Carlos, 13563-120, Brazil
| | - José N. Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77005-1827
| |
Collapse
|
87
|
Shimizu T, Huang D, Yan F, Stranava M, Bartosova M, Fojtíková V, Martínková M. Gaseous O2, NO, and CO in signal transduction: structure and function relationships of heme-based gas sensors and heme-redox sensors. Chem Rev 2015; 115:6491-533. [PMID: 26021768 DOI: 10.1021/acs.chemrev.5b00018] [Citation(s) in RCA: 131] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023]
Affiliation(s)
- Toru Shimizu
- †Department of Cell Biology and Genetics and Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Chaoshan Area of Guangdong Higher Education Institutes, Shantou University Medical College, Shantou, Guangdong 515041, China
- ‡Department of Biochemistry, Faculty of Science, Charles University in Prague, Prague 2 128 43, Czech Republic
- §Research Center for Compact Chemical System, National Institute of Advanced Industrial Science and Technology (AIST), Sendai 983-8551, Japan
| | - Dongyang Huang
- †Department of Cell Biology and Genetics and Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Chaoshan Area of Guangdong Higher Education Institutes, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Fang Yan
- †Department of Cell Biology and Genetics and Key Laboratory of Molecular Biology in High Cancer Incidence Coastal Chaoshan Area of Guangdong Higher Education Institutes, Shantou University Medical College, Shantou, Guangdong 515041, China
| | - Martin Stranava
- ‡Department of Biochemistry, Faculty of Science, Charles University in Prague, Prague 2 128 43, Czech Republic
| | - Martina Bartosova
- ‡Department of Biochemistry, Faculty of Science, Charles University in Prague, Prague 2 128 43, Czech Republic
| | - Veronika Fojtíková
- ‡Department of Biochemistry, Faculty of Science, Charles University in Prague, Prague 2 128 43, Czech Republic
| | - Markéta Martínková
- ‡Department of Biochemistry, Faculty of Science, Charles University in Prague, Prague 2 128 43, Czech Republic
| |
Collapse
|
88
|
Espada R, Parra RG, Mora T, Walczak AM, Ferreiro DU. Capturing coevolutionary signals inrepeat proteins. BMC Bioinformatics 2015; 16:207. [PMID: 26134293 PMCID: PMC4489039 DOI: 10.1186/s12859-015-0648-3] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2015] [Accepted: 06/16/2015] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The analysis of correlations of amino acid occurrences in globular domains has led to the development of statistical tools that can identify native contacts - portions of the chains that come to close distance in folded structural ensembles. Here we introduce a direct coupling analysis for repeat proteins - natural systems for which the identification of folding domains remains challenging. RESULTS We show that the inherent translational symmetry of repeat protein sequences introduces a strong bias in the pair correlations at precisely the length scale of the repeat-unit. Equalizing for this bias in an objective way reveals true co-evolutionary signals from which local native contacts can be identified. Importantly, parameter values obtained for all other interactions are not significantly affected by the equalization. We quantify the robustness of the procedure and assign confidence levels to the interactions, identifying the minimum number of sequences needed to extract evolutionary information in several repeat protein families. CONCLUSIONS The overall procedure can be used to reconstruct the interactions at distances larger than repeat-pairs, identifying the characteristics of the strongest couplings in each family, and can be applied to any system that appears translationally symmetric.
Collapse
Affiliation(s)
- Rocío Espada
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina.,Departamento de Física, Facultad de Ciencias Exactas y Naturales, Universidad de Buenos Aires, Buenos Aires, Argentina
| | - R Gonzalo Parra
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| | - Thierry Mora
- Laboratoire de physique statistique, CNRS, UPMC and École normale supérieure, 24 rue Lhomond, Paris, 75005, France
| | | | - Diego U Ferreiro
- Protein Physiology Lab, Dep de Química Biológica, Facultad de Ciencias Exactas y Naturales, UBA-CONICET-IQUIBICEN, Buenos Aires, Argentina
| |
Collapse
|
89
|
The two-component signalling networks of Mycobacterium tuberculosis display extensive cross-talk in vitro. Biochem J 2015; 469:121-34. [DOI: 10.1042/bj20150268] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2015] [Accepted: 05/01/2015] [Indexed: 02/06/2023]
Abstract
Bacteria use two-component signalling systems (TCSs) to sense and respond to environmental changes. Currently, they are thought to be highly specific, with each TCS functioning independently. Here, unlike the prevalent paradigm, we show that the TCSs of M. tuberculosis cross-talk extensively, thereby proposing an alternative signalling scenario.
Collapse
|
90
|
Malinverni D, Marsili S, Barducci A, De Los Rios P. Large-Scale Conformational Transitions and Dimerization Are Encoded in the Amino-Acid Sequences of Hsp70 Chaperones. PLoS Comput Biol 2015; 11:e1004262. [PMID: 26046683 PMCID: PMC4457872 DOI: 10.1371/journal.pcbi.1004262] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2014] [Accepted: 04/01/2015] [Indexed: 12/30/2022] Open
Abstract
Hsp70s are a class of ubiquitous and highly conserved molecular chaperones playing a central role in the regulation of proteostasis in the cell. Hsp70s assist a myriad of cellular processes by binding unfolded or misfolded substrates during a complex biochemical cycle involving large-scale structural rearrangements. Here we show that an analysis of coevolution at the residue level fully captures the characteristic large-scale conformational transitions of this protein family, and predicts an evolutionary conserved–and thus functional–homo-dimeric arrangement. Furthermore, we highlight that the features encoding the Hsp70 dimer are more conserved in bacterial than in eukaryotic sequences, suggesting that the known Hsp70/Hsp110 hetero-dimer is a eukaryotic specialization built on a pre-existing template. Molecular chaperones are a class of proteins that are crucial for the correct functioning of cells. They play central housekeeping roles in the normal cell cycle, and are major actors of the protection system of the cell against cell stress conditions. In this study, we apply statistical inference methods to analyse the structure and function of the Hsp70 molecular chaperone, one of the main members of chaperones. We use the correlated amino acid coevolutions in protein sequences to identify directly interacting amino acids. Our results show that coevolutions capture an appreciable fraction of native contacts throughout the protein. Furthermore, amino acid coevolution predicts previously hypothesized functional dimer interactions between Hsp70s, thus giving a theoretical contribution to this debate.
Collapse
Affiliation(s)
- Duccio Malinverni
- Laboratoire de Biophysique Statistique, École Polytechnique Fédérale de Lausanne, Faculté de Sciences de Base, Lausanne, Switzerland
| | - Simone Marsili
- Structural Computational Biology Group, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | - Alessandro Barducci
- Laboratoire de Biophysique Statistique, École Polytechnique Fédérale de Lausanne, Faculté de Sciences de Base, Lausanne, Switzerland
| | - Paolo De Los Rios
- Laboratoire de Biophysique Statistique, École Polytechnique Fédérale de Lausanne, Faculté de Sciences de Base, Lausanne, Switzerland
| |
Collapse
|
91
|
Yang S. Methods for SAXS-based structure determination of biomolecular complexes. ADVANCED MATERIALS (DEERFIELD BEACH, FLA.) 2014; 26:7902-10. [PMID: 24888261 PMCID: PMC4285438 DOI: 10.1002/adma.201304475] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/05/2013] [Revised: 03/10/2014] [Indexed: 05/20/2023]
Abstract
Measurements from small-angle X-ray scattering (SAXS) are highly informative to determine the structures of bimolecular complexes in solution. Here, current and recent SAXS-driven developments are described, with an emphasis on computational modeling. In particular, accurate methods to computing one theoretical scattering profile from a given structure model are discussed, with a key focus on structure factor coarse-graining and hydration contribution. Methods for reconstructing topological structures from an experimental SAXS profile are currently under active development. We report on several modeling tools designed for conformation generation that make use of either atomic-level or coarse-grained representations. Furthermore, since large, flexible biomolecules can adopt multiple well-defined conformations, a traditional single-conformation SAXS analysis is inappropriate, so we also discuss recent methods that utilize the concept of ensemble optimization, weighing in on the SAXS contributions of a heterogeneous mixture of conformations. These tools will ultimately posit the usefulness of SAXS data beyond a simple space-filling approach by providing a reliable structure characterization of biomolecular complexes under physiological conditions.
Collapse
Affiliation(s)
- Sichun Yang
- Center for Proteomics and Department of Pharmacology, Department of Physiology and Biophysics, Case Western Reserve University, 10900 Euclid Ave, Cleveland, OH 44106-4988, USA
| |
Collapse
|
92
|
Schmidl SR, Sheth RU, Wu A, Tabor JJ. Refactoring and optimization of light-switchable Escherichia coli two-component systems. ACS Synth Biol 2014; 3:820-31. [PMID: 25250630 DOI: 10.1021/sb500273n] [Citation(s) in RCA: 105] [Impact Index Per Article: 10.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
Abstract
Light-switchable proteins enable unparalleled control of molecular biological processes in live organisms. Previously, we have engineered red/far-red and green/red photoreversible two-component signal transduction systems (TCSs) with transcriptional outputs in E. coli and used them to characterize and control synthetic gene circuits with exceptional quantitative, temporal, and spatial precision. However, the broad utility of these light sensors is limited by bulky DNA encoding, incompatibility with commonly used ligand-responsive transcription factors, leaky output in deactivating light, and less than 10-fold dynamic range. Here, we compress the four genes required for each TCS onto two streamlined plasmids and replace all chemically inducible and evolved promoters with constitutive, engineered versions. Additionally, we systematically optimize the expression of each sensor histidine kinase and response regulator, and redesign both pathway output promoters, resulting in low leakiness and 72- and 117-fold dynamic range, respectively. These second-generation light sensors can be used to program the expression of more genes over a wider range and can be more easily combined with additional plasmids or moved to different host strains. This work demonstrates that bacterial TCSs can be optimized to function as high-performance sensors for scientific and engineering applications.
Collapse
Affiliation(s)
- Sebastian R. Schmidl
- Department of Bioengineering and ‡Department of
Biochemistry and Cell Biology, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Ravi U. Sheth
- Department of Bioengineering and ‡Department of
Biochemistry and Cell Biology, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Andrew Wu
- Department of Bioengineering and ‡Department of
Biochemistry and Cell Biology, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| | - Jeffrey J. Tabor
- Department of Bioengineering and ‡Department of
Biochemistry and Cell Biology, Rice University, 6100 Main Street, Houston, Texas 77005, United States
| |
Collapse
|
93
|
Tamir S, Paddock ML, Darash-Yahana-Baram M, Holt SH, Sohn YS, Agranat L, Michaeli D, Stofleth JT, Lipper CH, Morcos F, Cabantchik IZ, Onuchic JN, Jennings PA, Mittler R, Nechushtai R. Structure-function analysis of NEET proteins uncovers their role as key regulators of iron and ROS homeostasis in health and disease. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2014; 1853:1294-315. [PMID: 25448035 DOI: 10.1016/j.bbamcr.2014.10.014] [Citation(s) in RCA: 116] [Impact Index Per Article: 11.6] [Reference Citation Analysis] [Abstract] [Key Words] [Subscribe] [Scholar Register] [Received: 07/04/2014] [Revised: 10/01/2014] [Accepted: 10/16/2014] [Indexed: 12/31/2022]
Abstract
A novel family of 2Fe-2S proteins, the NEET family, was discovered during the last decade in numerous organisms, including archea, bacteria, algae, plant and human; suggesting an evolutionary-conserved function, potentially mediated by their CDGSH Iron-Sulfur Domain. In human, three NEET members encoded by the CISD1-3 genes were identified. The structures of CISD1 (mitoNEET, mNT), CISD2 (NAF-1), and the plant At-NEET uncovered a homodimer with a unique "NEET fold", as well as two distinct domains: a beta-cap and a 2Fe-2S cluster-binding domain. The 2Fe-2S clusters of NEET proteins were found to be coordinated by a novel 3Cys:1His structure that is relatively labile compared to other 2Fe-2S proteins and is the reason of the NEETs' clusters could be transferred to apo-acceptor protein(s) or mitochondria. Positioned at the protein surface, the NEET's 2Fe-2S's coordinating His is exposed to protonation upon changes in its environment, potentially suggesting a sensing function for this residue. Studies in different model systems demonstrated a role for NAF-1 and mNT in the regulation of cellular iron, calcium and ROS homeostasis, and uncovered a key role for NEET proteins in critical processes, such as cancer cell proliferation and tumor growth, lipid and glucose homeostasis in obesity and diabetes, control of autophagy, longevity in mice, and senescence in plants. Abnormal regulation of NEET proteins was consequently found to result in multiple health conditions, and aberrant splicing of NAF-1 was found to be a causative of the neurological genetic disorder Wolfram Syndrome 2. Here we review the discovery of NEET proteins, their structural, biochemical and biophysical characterization, and their most recent structure-function analyses. We additionally highlight future avenues of research focused on NEET proteins and propose an essential role for NEETs in health and disease. This article is part of a Special Issue entitled: Fe/S proteins: Analysis, structure, function, biogenesis and diseases.
Collapse
Affiliation(s)
- Sagi Tamir
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Mark L Paddock
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Merav Darash-Yahana-Baram
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Sarah H Holt
- Department of Biology, University of North Texas, Denton, TX 76203, USA
| | - Yang Sung Sohn
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Lily Agranat
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Dorit Michaeli
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Jason T Stofleth
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Colin H Lipper
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Faruck Morcos
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77050, USA; Department of Physics and Astronomy, Rice University, Houston, TX 77050, USA; Department of Chemistry, Rice University, Houston, TX 77050, USA; Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77050, USA
| | - Ioav Z Cabantchik
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel
| | - Jose' N Onuchic
- Center for Theoretical Biological Physics, Rice University, Houston, TX 77050, USA; Department of Physics and Astronomy, Rice University, Houston, TX 77050, USA; Department of Chemistry, Rice University, Houston, TX 77050, USA; Department of Biochemistry and Cell Biology, Rice University, Houston, TX 77050, USA
| | - Patricia A Jennings
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | - Ron Mittler
- Department of Biology, University of North Texas, Denton, TX 76203, USA
| | - Rachel Nechushtai
- The Alexander Silberman Life Science Institute and the Wolfson Centre for Applied Structural Biology, Hebrew University of Jerusalem, Edmond J. Safra Campus at Givat Ram, Jerusalem 91904, Israel.
| |
Collapse
|
94
|
Ortet P, Whitworth DE, Santaella C, Achouak W, Barakat M. P2CS: updates of the prokaryotic two-component systems database. Nucleic Acids Res 2014; 43:D536-41. [PMID: 25324303 PMCID: PMC4384028 DOI: 10.1093/nar/gku968] [Citation(s) in RCA: 74] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The P2CS database (http://www.p2cs.org/) is a comprehensive resource for the analysis of Prokaryotic Two-Component Systems (TCSs). TCSs are comprised of a receptor histidine kinase (HK) and a partner response regulator (RR) and control important prokaryotic behaviors. The latest incarnation of P2CS includes 164 651 TCS proteins, from 2758 sequenced prokaryotic genomes. Several important new features have been added to P2CS since it was last described. Users can search P2CS via BLAST, adding hits to their cart, and homologous proteins can be aligned using MUSCLE and viewed using Jalview within P2CS. P2CS also provides phylogenetic trees based on the conserved signaling domains of the RRs and HKs from entire genomes. HK and RR trees are annotated with gene organization and domain architecture, providing insights into the evolutionary origin of the contemporary gene set. The majority of TCSs are encoded by adjacent HK and RR genes, however, ‘orphan’ unpaired TCS genes are also abundant and identifying their partner proteins is challenging. P2CS now provides paired HK and RR trees with proteins from the same genetic locus indicated. This allows the appraisal of evolutionary relationships across entire TCSs and in some cases the identification of candidate partners for orphan TCS proteins.
Collapse
Affiliation(s)
- Philippe Ortet
- CEA, IBEB, Lab Ecol Microb Rhizosphere & Environ Extrem, Saint-Paul-lez-Durance F-13108, France CNRS, UMR 7265 Biol Veget & Microbiol Environ, Saint-Paul-lez-Durance F-13108, France Aix Marseille Université, BVME UMR7265, Marseille F-13284, France
| | - David E Whitworth
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Ceredigion, SY23 3DD, UK
| | - Catherine Santaella
- CEA, IBEB, Lab Ecol Microb Rhizosphere & Environ Extrem, Saint-Paul-lez-Durance F-13108, France CNRS, UMR 7265 Biol Veget & Microbiol Environ, Saint-Paul-lez-Durance F-13108, France Aix Marseille Université, BVME UMR7265, Marseille F-13284, France
| | - Wafa Achouak
- CEA, IBEB, Lab Ecol Microb Rhizosphere & Environ Extrem, Saint-Paul-lez-Durance F-13108, France CNRS, UMR 7265 Biol Veget & Microbiol Environ, Saint-Paul-lez-Durance F-13108, France Aix Marseille Université, BVME UMR7265, Marseille F-13284, France
| | - Mohamed Barakat
- CEA, IBEB, Lab Ecol Microb Rhizosphere & Environ Extrem, Saint-Paul-lez-Durance F-13108, France CNRS, UMR 7265 Biol Veget & Microbiol Environ, Saint-Paul-lez-Durance F-13108, France Aix Marseille Université, BVME UMR7265, Marseille F-13284, France
| |
Collapse
|
95
|
Sudha G, Nussinov R, Srinivasan N. An overview of recent advances in structural bioinformatics of protein-protein interactions and a guide to their principles. PROGRESS IN BIOPHYSICS AND MOLECULAR BIOLOGY 2014; 116:141-50. [PMID: 25077409 DOI: 10.1016/j.pbiomolbio.2014.07.004] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 07/13/2014] [Indexed: 12/20/2022]
Abstract
Rich data bearing on the structural and evolutionary principles of protein-protein interactions are paving the way to a better understanding of the regulation of function in the cell. This is particularly the case when these interactions are considered in the framework of key pathways. Knowledge of the interactions may provide insights into the mechanisms of crucial 'driver' mutations in oncogenesis. They also provide the foundation toward the design of protein-protein interfaces and inhibitors that can abrogate their formation or enhance them. The main features to learn from known 3-D structures of protein-protein complexes and the extensive literature which analyzes them computationally and experimentally include the interaction details which permit undertaking structure-based drug discovery, the evolution of complexes and their interactions, the consequences of alterations such as post-translational modifications, ligand binding, disease causing mutations, host pathogen interactions, oligomerization, aggregation and the roles of disorder, dynamics, allostery and more to the protein and the cell. This review highlights some of the recent advances in these areas, including design, inhibition and prediction of protein-protein complexes. The field is broad, and much work has been carried out in these areas, making it challenging to cover it in its entirety. Much of this is due to the fast increase in the number of molecules whose structures have been determined experimentally and the vast increase in computational power. Here we provide a concise overview.
Collapse
Affiliation(s)
- Govindarajan Sudha
- Molecular Biophysics Unit, Indian Institute of Science, Bangalore 560012, India.
| | - Ruth Nussinov
- Cancer and Inflammation Program, Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc., National Cancer Institute, Frederick, MD 21702, USA; Sackler Institute of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel.
| | | |
Collapse
|
96
|
Huang W, Ravikumar KM, Yang S. A Newfound Cancer-Activating Mutation Reshapes the Energy Landscape of Estrogen-Binding Domain. J Chem Theory Comput 2014; 10:2897-900. [DOI: 10.1021/ct500313e] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Affiliation(s)
- Wei Huang
- Center
for Proteomics and
Department of Pharmacology, Case Western Reserve University, Cleveland, Ohio 44106-4988, United States
| | - Krishnakumar M. Ravikumar
- Center
for Proteomics and
Department of Pharmacology, Case Western Reserve University, Cleveland, Ohio 44106-4988, United States
| | - Sichun Yang
- Center
for Proteomics and
Department of Pharmacology, Case Western Reserve University, Cleveland, Ohio 44106-4988, United States
| |
Collapse
|
97
|
Gültas M, Düzgün G, Herzog S, Jäger SJ, Meckbach C, Wingender E, Waack S. Quantum coupled mutation finder: predicting functionally or structurally important sites in proteins using quantum Jensen-Shannon divergence and CUDA programming. BMC Bioinformatics 2014; 15:96. [PMID: 24694117 PMCID: PMC4098773 DOI: 10.1186/1471-2105-15-96] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 03/26/2014] [Indexed: 11/29/2022] Open
Abstract
Background The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites. Results The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs. Conclusions QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF’s algorithm, we leveraged Compute Unified Device Architecture (CUDA). The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr, 7, 37077 Göttingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
98
|
Integrated strategy reveals the protein interface between cancer targets Bcl-2 and NAF-1. Proc Natl Acad Sci U S A 2014; 111:5177-82. [PMID: 24706857 DOI: 10.1073/pnas.1403770111] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022] Open
Abstract
Life requires orchestrated control of cell proliferation, cell maintenance, and cell death. Involved in these decisions are protein complexes that assimilate a variety of inputs that report on the status of the cell and lead to an output response. Among the proteins involved in this response are nutrient-deprivation autophagy factor-1 (NAF-1)- and Bcl-2. NAF-1 is a homodimeric member of the novel Fe-S protein NEET family, which binds two 2Fe-2S clusters. NAF-1 is an important partner for Bcl-2 at the endoplasmic reticulum to functionally antagonize Beclin 1-dependent autophagy [Chang NC, Nguyen M, Germain M, Shore GC (2010) EMBO J 29(3):606-618]. We used an integrated approach involving peptide array, deuterium exchange mass spectrometry (DXMS), and functional studies aided by the power of sufficient constraints from direct coupling analysis (DCA) to determine the dominant docked conformation of the NAF-1-Bcl-2 complex. NAF-1 binds to both the pro- and antiapoptotic regions (BH3 and BH4) of Bcl-2, as demonstrated by a nested protein fragment analysis in a peptide array and DXMS analysis. A combination of the solution studies together with a new application of DCA to the eukaryotic proteins NAF-1 and Bcl-2 provided sufficient constraints at amino acid resolution to predict the interaction surfaces and orientation of the protein-protein interactions involved in the docked structure. The specific integrated approach described in this paper provides the first structural information, to our knowledge, for future targeting of the NAF-1-Bcl-2 complex in the regulation of apoptosis/autophagy in cancer biology.
Collapse
|