1
|
Fleishman SJ, Horovitz A. Extending the New Generation of Structure Predictors to Account for Dynamics and Allostery. J Mol Biol 2021; 433:167007. [PMID: 33901536 DOI: 10.1016/j.jmb.2021.167007] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2021] [Revised: 04/18/2021] [Accepted: 04/19/2021] [Indexed: 10/21/2022]
Abstract
Recent progress in structure-prediction methods that rely on deep learning suggests that the atomic structure of almost any protein may soon be predictable directly from its amino acid sequence. This much-awaited revolution was driven by substantial improvements in the reliability of methods for inferring the spatial distances between amino acid pairs from an analysis of homologous sequences. Improved reliability has been accompanied, however, by a reduced ability to detect amino acid relationships that are not due to direct spatial contacts, such as those that arise from protein dynamics or allostery. Given the central importance of dynamics and allostery to protein activity, we argue that an important future advance would extend modeling beyond predicting a single static structure. Here, we briefly review some of the developments that have led to the remarkable recent achievement in structure prediction and speculate what methods and sources of information may be leveraged in the future to develop a modeling framework that addresses protein dynamics and allostery.
Collapse
Affiliation(s)
- Sarel J Fleishman
- Department of Biomolecular Sciences, Weizmann Institute of Science, Rehovot 7600001, Israel.
| | - Amnon Horovitz
- Department of Chemical and Structural Biology, Weizmann Institute of Science, Rehovot 7600001, Israel.
| |
Collapse
|
2
|
Jing X, Dong Q, Lu R, Dong Q. Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications. Curr Bioinform 2019. [DOI: 10.2174/1574893613666181109130430] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.Objective:We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.Results:Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.Conclusion:Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, China
| | - Qimin Dong
- Vocational and Technical Education Center of Linxi County, Chifeng, Inner Mongolia, China
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, China
| | - Qiwen Dong
- Faculty of Education, East China Normal University, Shanghai, China
| |
Collapse
|
3
|
Vorberg S, Seemayer S, Söding J. Synthetic protein alignments by CCMgen quantify noise in residue-residue contact prediction. PLoS Comput Biol 2018; 14:e1006526. [PMID: 30395601 PMCID: PMC6237422 DOI: 10.1371/journal.pcbi.1006526] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2018] [Revised: 11/15/2018] [Accepted: 09/24/2018] [Indexed: 12/01/2022] Open
Abstract
Compensatory mutations between protein residues in physical contact can manifest themselves as statistical couplings between the corresponding columns in a multiple sequence alignment (MSA) of the protein family. Conversely, large coupling coefficients predict residue contacts. Methods for de-novo protein structure prediction based on this approach are becoming increasingly reliable. Their main limitation is the strong systematic and statistical noise in the estimation of coupling coefficients, which has so far limited their application to very large protein families. While most research has focused on improving predictions by adding external information, little progress has been made to improve the statistical procedure at the core, because our lack of understanding of the sources of noise poses a major obstacle. First, we show theoretically that the expectation value of the coupling score assuming no coupling is proportional to the product of the square roots of the column entropies, and we propose a simple entropy bias correction (EntC) that subtracts out this expectation value. Second, we show that the average product correction (APC) includes the correction of the entropy bias, partly explaining its success. Third, we have developed CCMgen, the first method for simulating protein evolution and generating realistic synthetic MSAs with pairwise statistical residue couplings. Fourth, to learn exact statistical models that reliably reproduce observed alignment statistics, we developed CCMpredPy, an implementation of the persistent contrastive divergence (PCD) method for exact inference. Fifth, we demonstrate how CCMgen and CCMpredPy can facilitate the development of contact prediction methods by analysing the systematic noise contributions from phylogeny and entropy. Using the entropy bias correction, we can disentangle both sources of noise and find that entropy contributes roughly twice as much noise as phylogeny. Knowledge about the three-dimensional structure of proteins is key to understanding their function and role in biological processes and diseases. The experimental structure determination techniques, such as X-ray crystallography or electron cryo-microscopy, are labour intensive, time-consuming and expensive. Therefore, complementary computational methods to predict a protein’s structure have become indispensable. Over the last years, immense progress has been made in predicting protein structures from their amino acid sequence by utilizing highly accurate predictions of spatial contacts between amino acid residues as constraints in folding simulations. However, contact prediction methods require large numbers of homologous protein sequences in order to discriminate between signal and noise. A major obstacle preventing progress on the statistical methodology is our limited understanding of the different components of noise that are known to affect the predictions. We provide two tools, CCMpredPy and CCMgen, that can be used to learn highly accurate statistical models for contact prediction and to simulate protein evolution according to the statistical constraints between positions of residues as specified by these models, respectively. We showcase their usefulness by quantifying the relative contribution of noise arising from entropy and phylogeny on the predicted contacts, which will facilitate the improvement of the statistical methodology.
Collapse
Affiliation(s)
- Susann Vorberg
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Stefan Seemayer
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| | - Johannes Söding
- Quantitative and Computational Biology Group, Max-Planck Institute for Biophysical Chemistry, Göttingen, Germany
| |
Collapse
|
4
|
Jing X, Dong Q, Lu R. RRCRank: a fusion method using rank strategy for residue-residue contact prediction. BMC Bioinformatics 2017; 18:390. [PMID: 28865433 PMCID: PMC5581475 DOI: 10.1186/s12859-017-1811-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2017] [Accepted: 08/28/2017] [Indexed: 11/10/2022] Open
Abstract
Background In structural biology area, protein residue-residue contacts play a crucial role in protein structure prediction. Some researchers have found that the predicted residue-residue contacts could effectively constrain the conformational search space, which is significant for de novo protein structure prediction. In the last few decades, related researchers have developed various methods to predict residue-residue contacts, especially, significant performance has been achieved by using fusion methods in recent years. In this work, a novel fusion method based on rank strategy has been proposed to predict contacts. Unlike the traditional regression or classification strategies, the contact prediction task is regarded as a ranking task. First, two kinds of features are extracted from correlated mutations methods and ensemble machine-learning classifiers, and then the proposed method uses the learning-to-rank algorithm to predict contact probability of each residue pair. Results First, we perform two benchmark tests for the proposed fusion method (RRCRank) on CASP11 dataset and CASP12 dataset respectively. The test results show that the RRCRank method outperforms other well-developed methods, especially for medium and short range contacts. Second, in order to verify the superiority of ranking strategy, we predict contacts by using the traditional regression and classification strategies based on the same features as ranking strategy. Compared with these two traditional strategies, the proposed ranking strategy shows better performance for three contact types, in particular for long range contacts. Third, the proposed RRCRank has been compared with several state-of-the-art methods in CASP11 and CASP12. The results show that the RRCRank could achieve comparable prediction precisions and is better than three methods in most assessment metrics. Conclusions The learning-to-rank algorithm is introduced to develop a novel rank-based method for the residue-residue contact prediction of proteins, which achieves state-of-the-art performance based on the extensive assessment. Electronic supplementary material The online version of this article (10.1186/s12859-017-1811-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiaoyang Jing
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| | - Qiwen Dong
- School of Data Science and Engineering, East China Normal University, Shanghai, 200062, People's Republic of China.
| | - Ruqian Lu
- School of Computer Science, Fudan University, Shanghai, 200433, People's Republic of China
| |
Collapse
|
5
|
A New Strategy to Reduce Influenza Escape: Detecting Therapeutic Targets Constituted of Invariance Groups. Viruses 2017; 9:v9030038. [PMID: 28257108 PMCID: PMC5371793 DOI: 10.3390/v9030038] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2016] [Revised: 02/03/2017] [Accepted: 02/23/2017] [Indexed: 12/26/2022] Open
Abstract
The pathogenicity of the different flu species is a real public health problem worldwide. To combat this scourge, we established a method to detect drug targets, reducing the possibility of escape. Besides being able to attach a drug candidate, these targets should have the main characteristic of being part of an essential viral function. The invariance groups that are sets of residues bearing an essential function can be detected genetically. They consist of invariant and synthetic lethal residues (interdependent residues not varying or slightly varying when together). We analyzed an alignment of more than 10,000 hemagglutinin sequences of influenza to detect six invariance groups, close in space, and on the protein surface. In parallel we identified five potential pockets on the surface of hemagglutinin. By combining these results, three potential binding sites were determined that are composed of invariance groups located respectively in the vestigial esterase domain, in the bottom of the stem and in the fusion area. The latter target is constituted of residues involved in the spring-loaded mechanism, an essential step in the fusion process. We propose a model describing how this potential target could block the reorganization of the hemagglutinin HA2 secondary structure and prevent viral entry into the host cell.
Collapse
|
6
|
Zhang H, Gao Y, Deng M, Wang C, Zhu J, Li SC, Zheng WM, Bu D. Improving residue-residue contact prediction via low-rank and sparse decomposition of residue correlation matrix. Biochem Biophys Res Commun 2016; 472:217-22. [PMID: 26920058 DOI: 10.1016/j.bbrc.2016.01.188] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2016] [Accepted: 01/30/2016] [Indexed: 10/22/2022]
Abstract
Strategies for correlation analysis in protein contact prediction often encounter two challenges, namely, the indirect coupling among residues, and the background correlations mainly caused by phylogenetic biases. While various studies have been conducted on how to disentangle indirect coupling, the removal of background correlations still remains unresolved. Here, we present an approach for removing background correlations via low-rank and sparse decomposition (LRS) of a residue correlation matrix. The correlation matrix can be constructed using either local inference strategies (e.g., mutual information, or MI) or global inference strategies (e.g., direct coupling analysis, or DCA). In our approach, a correlation matrix was decomposed into two components, i.e., a low-rank component representing background correlations, and a sparse component representing true correlations. Finally the residue contacts were inferred from the sparse component of correlation matrix. We trained our LRS-based method on the PSICOV dataset, and tested it on both GREMLIN and CASP11 datasets. Our experimental results suggested that LRS significantly improves the contact prediction precision. For example, when equipped with the LRS technique, the prediction precision of MI and mfDCA increased from 0.25 to 0.67 and from 0.58 to 0.70, respectively (Top L/10 predicted contacts, sequence separation: 5 AA, dataset: GREMLIN). In addition, our LRS technique also consistently outperforms the popular denoising technique APC (average product correction), on both local (MI_LRS: 0.67 vs MI_APC: 0.34) and global measures (mfDCA_LRS: 0.70 vs mfDCA_APC: 0.67). Interestingly, we found out that when equipped with our LRS technique, local inference strategies performed in a comparable manner to that of global inference strategies, implying that the application of LRS technique narrowed down the performance gap between local and global inference strategies. Overall, our LRS technique greatly facilitates protein contact prediction by removing background correlations. An implementation of the approach called COLORS (improving COntact prediction using LOw-Rank and Sparse matrix decomposition) is available from http://protein.ict.ac.cn/COLORS/.
Collapse
Affiliation(s)
- Haicang Zhang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Yujuan Gao
- Center for Quantitative Biology, Peking University, Beijing, China
| | - Minghua Deng
- Center for Quantitative Biology, Peking University, Beijing, China; School of Mathematical Sciences, Peking University, Beijing, China; Center for Statistical Sciences, Peking University, Beijing, China
| | - Chao Wang
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Jianwei Zhu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China; University of Chinese Academy of Sciences, Beijing, China
| | - Shuai Cheng Li
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Wei-Mou Zheng
- Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing, China.
| | - Dongbo Bu
- Key Lab of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Bejing, China.
| |
Collapse
|
7
|
Abstract
Allosteric transition, defined as conformational changes induced by ligand binding, is one of the fundamental properties of proteins. Allostery has been observed and characterized in many proteins, and has been recently utilized to control protein function via regulation of protein activity. Here, we review the physical and evolutionary origin of protein allostery, as well as its importance to protein regulation, drug discovery, and biological processes in living systems. We describe recently developed approaches to identify allosteric pathways, connected sets of pairwise interactions that are responsible for propagation of conformational change from the ligand-binding site to a distal functional site. We then present experimental and computational protein engineering approaches for control of protein function by modulation of allosteric sites. As an example of application of these approaches, we describe a synergistic computational and experimental approach to rescue the cystic-fibrosis-associated protein cystic fibrosis transmembrane conductance regulator, which upon deletion of a single residue misfolds and causes disease. This example demonstrates the power of allosteric manipulation in proteins to both elucidate mechanisms of molecular function and to develop therapeutic strategies that rescue those functions. Allosteric control of proteins provides a tool to shine a light on the complex cascades of cellular processes and facilitate unprecedented interrogation of biological systems.
Collapse
Affiliation(s)
- Nikolay V Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina , Chapel Hill, North Carolina 27599, United States
| |
Collapse
|
8
|
Jacob E, Unger R, Horovitz A. Codon-level information improves predictions of inter-residue contacts in proteins by correlated mutation analysis. eLife 2015; 4:e08932. [PMID: 26371555 PMCID: PMC4602084 DOI: 10.7554/elife.08932] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2015] [Accepted: 09/13/2015] [Indexed: 12/11/2022] Open
Abstract
Methods for analysing correlated mutations in proteins are becoming an increasingly powerful tool for predicting contacts within and between proteins. Nevertheless, limitations remain due to the requirement for large multiple sequence alignments (MSA) and the fact that, in general, only the relatively small number of top-ranking predictions are reliable. To date, methods for analysing correlated mutations have relied exclusively on amino acid MSAs as inputs. Here, we describe a new approach for analysing correlated mutations that is based on combined analysis of amino acid and codon MSAs. We show that a direct contact is more likely to be present when the correlation between the positions is strong at the amino acid level but weak at the codon level. The performance of different methods for analysing correlated mutations in predicting contacts is shown to be enhanced significantly when amino acid and codon data are combined.
Collapse
Affiliation(s)
- Etai Jacob
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| |
Collapse
|
9
|
Petitjean M, Badel A, Veitia RA, Vanet A. Synthetic lethals in HIV: ways to avoid drug resistance : Running title: Preventing HIV resistance. Biol Direct 2015; 10:17. [PMID: 25888435 PMCID: PMC4399722 DOI: 10.1186/s13062-015-0044-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2014] [Accepted: 02/23/2015] [Indexed: 12/19/2022] Open
Abstract
Background RNA viruses rapidly accumulate genetic variation, which can give rise to synthetic lethal (SL) and deleterious (SD) mutations. Synthetic lethal mutations (non-lethal when alone but lethal when combined in one genome) have been studied to develop cancer therapies. This principle can also be used against fast-evolving RNA-viruses. Indeed, targeting protein sites involved in SD + SL interactions with a drug would render any mutation of such sites, lethal. Results Here, we set up a strategy to detect intragenic pairs of SL and SD at the surface of the protein to predict less escapable drug target sites. For this, we detected SD + SL, studying HIV protease (PR) and reverse transcriptase (RT) sequence alignments from two groups of VIH+ individuals: treated with drugs (T) or not (NT). Using a series of statistical approaches, we were able to propose bona fide SD + SL couples. When focusing on spatially close co-variant SD + SL couples at the surface of the protein, we found 5 SD + SL groups (2 in the protease and 3 in the reverse transcriptase), which could be good candidates to form pockets to accommodate potential drugs. Conclusions Thus, designing drugs targeting these specific SD + SL groups would not allow the virus to mutate any residue involved in such groups without losing an essential function. Moreover, we also show that the selection pressure induced by the treatment leads to the appearance of new mutations, which change the mutational landscape of the protein. This drives the existence of differential SD + SL couples between the drug-treated and non-treated groups. Thus, new anti-viral drugs should be designed differently to target such groups. Reviewers This article was reviewed by Neil Greenspan Csaba Pal and István Simon. Electronic supplementary material The online version of this article (doi:10.1186/s13062-015-0044-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Michel Petitjean
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,MTI, INSERM UMR-S 973, F-75013, Paris, France.
| | - Anne Badel
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,MTI, INSERM UMR-S 973, F-75013, Paris, France.
| | - Reiner A Veitia
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,CNRS, UMR7592, Institut Jacques Monod, F-75013, Paris, France.
| | - Anne Vanet
- Univ Paris Diderot, Sorbonne Paris Cité, F-75013, Paris, France. .,CNRS, UMR7592, Institut Jacques Monod, F-75013, Paris, France. .,Atelier de Bio Informatique, F-75005, Paris, France.
| |
Collapse
|
10
|
Mao W, Kaya C, Dutta A, Horovitz A, Bahar I. Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution. Bioinformatics 2015; 31:1929-37. [PMID: 25697822 PMCID: PMC4481699 DOI: 10.1093/bioinformatics/btv103] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2014] [Accepted: 02/02/2015] [Indexed: 01/02/2023] Open
Abstract
Motivation: With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Results: Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Availability and implementation: Software is freely available through the Evol component of ProDy API. Contact:bahar@pitt.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenzhi Mao
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Cihan Kaya
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Anindita Dutta
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Amnon Horovitz
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15260, USA, Department of Pharmacology, School of Medicine, Tsinghua University, Beijing 100084, China and Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel
| |
Collapse
|
11
|
Proctor EA, Kota P, Demarest SJ, Caravella JA, Dokholyan NV. Highly covarying residues have a functional role in antibody constant domains. Proteins 2013; 81:884-95. [PMID: 23280585 DOI: 10.1002/prot.24247] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2012] [Revised: 12/05/2012] [Accepted: 12/14/2012] [Indexed: 01/25/2023]
Abstract
The ability to generate and design antibodies recognizing specific targets has revolutionized the pharmaceutical industry and medical imaging. Engineering antibody therapeutics in some cases requires modifying their constant domains to enable new and altered interactions. Engineering novel specificities into antibody constant domains has proved challenging due to the complexity of inter-domain interactions. Covarying networks of residues that tend to cluster on the protein surface and near binding sites have been identified in some proteins. However, the underlying role these networks play in the protein resulting in their conservation remains unclear in most cases. Resolving their role is crucial, because residues in these networks are not viable design targets if their role is to maintain the fold of the protein. Conversely, these networks of residues are ideal candidates for manipulating specificity if they are primarily involved in binding, such as the myriad interdomain interactions maintained within antibodies. Here, we identify networks of evolutionarily-related residues in C-class antibody domains by evaluating covariation, a measure of propensity with which residue pairs vary dependently during evolution. We computationally test whether mutation of residues in these networks affects stability of the folded antibody domain, determining their viability as design candidates. We find that members of covarying networks cluster at domain-domain interfaces, and that mutations to these residues are diverse and frequent during evolution, precluding their importance to domain stability. These results indicate that networks of covarying residues exist in antibody domains for functional reasons unrelated to thermodynamic stability, making them ideal targets for antibody design.
Collapse
Affiliation(s)
- Elizabeth A Proctor
- Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599-7260, USA
| | | | | | | | | |
Collapse
|
12
|
Gültas M, Haubrock M, Tüysüz N, Waack S. Coupled mutation finder: a new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinformatics 2012; 13:225. [PMID: 22963049 PMCID: PMC3577461 DOI: 10.1186/1471-2105-13-225] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 08/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations. RESULTS To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality. CONCLUSIONS The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany.
| | | | | | | |
Collapse
|
13
|
Biniashvili T, Schreiber E, Kliger Y. Improving Classical Substructure-Based Virtual Screening to Handle Extrapolation Challenges. J Chem Inf Model 2012; 52:678-85. [DOI: 10.1021/ci200472s] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Affiliation(s)
- Tammy Biniashvili
- Compugen LTD, Tel Aviv 69512, Israel
- The Mina and Everard Goodman
Faculty of Life Sciences, Bar Ilan University, Ramat-Gan 52900, Israel
| | | | | |
Collapse
|
14
|
Livesay DR, Kreth KE, Fodor AA. A critical evaluation of correlated mutation algorithms and coevolution within allosteric mechanisms. Methods Mol Biol 2012; 796:385-398. [PMID: 22052502 DOI: 10.1007/978-1-61779-334-9_21] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
The notion of using the evolutionary history encoded within multiple sequence alignments to predict allosteric mechanisms is appealing. In this approach, correlated mutations are expected to reflect coordinated changes that maintain intramolecular coupling between residue pairs. Despite much early fanfare, the general suitability of correlated mutations to predict allosteric couplings has not yet been established. Lack of progress along these lines has been hindered by several algorithmic limitations including phylogenetic artifacts within alignments masking true covariance and the computational intractability of consideration of more than two correlated residues at a time. Recent progress in algorithm development, however, has been substantial with a new generation of correlated mutation algorithms that have made fundamental progress toward solving these difficult problems. Despite these encouraging results, there remains little evidence to suggest that the evolutionary constraints acting on allosteric couplings are sufficient to be recovered from multiple sequence alignments. In this review, we argue that due to the exquisite sensitivity of protein dynamics, and hence that of allosteric mechanisms, the latter vary widely within protein families. If it turns out to be generally true that even very similar homologs display a wide divergence of allosteric mechanisms, then even a perfect correlated mutation algorithm could not be reliably used as a general mechanism for discovery of allosteric pathways.
Collapse
Affiliation(s)
- Dennis R Livesay
- Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA
| | | | | |
Collapse
|
15
|
Sreekumar J, ter Braak CJF, van Ham RCHJ, van Dijk ADJ. Correlated mutations via regularized multinomial regression. BMC Bioinformatics 2011; 12:444. [PMID: 22082126 PMCID: PMC3247924 DOI: 10.1186/1471-2105-12-444] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2011] [Accepted: 11/14/2011] [Indexed: 11/13/2022] Open
Abstract
Background In addition to sequence conservation, protein multiple sequence alignments contain evolutionary signal in the form of correlated variation among amino acid positions. This signal indicates positions in the sequence that influence each other, and can be applied for the prediction of intra- or intermolecular contacts. Although various approaches exist for the detection of such correlated mutations, in general these methods utilize only pairwise correlations. Hence, they tend to conflate direct and indirect dependencies. Results We propose RMRCM, a method for Regularized Multinomial Regression in order to obtain Correlated Mutations from protein multiple sequence alignments. Importantly, our method is not restricted to pairwise (column-column) comparisons only, but takes into account the network nature of relationships between protein residues in order to predict residue-residue contacts. The use of regularization ensures that the number of predicted links between columns in the multiple sequence alignment remains limited, preventing overprediction. Using simulated datasets we analyzed the performance of our approach in predicting residue-residue contacts, and studied how it is influenced by various types of noise. For various biological datasets, validation with protein structure data indicates a good performance of the proposed algorithm for the prediction of residue-residue contacts, in comparison to previous results. RMRCM can also be applied to predict interactions (in addition to only predicting interaction sites or contact sites), as demonstrated by predicting PDZ-peptide interactions. Conclusions A novel method is presented, which uses regularized multinomial regression in order to obtain correlated mutations from protein multiple sequence alignments. Availability R-code of our implementation is available via http://www.ab.wur.nl/rmrcm
Collapse
Affiliation(s)
- Janardanan Sreekumar
- Central Tuber Crops Research Institute, Thiruvananthapuram-695017, Kerala, India
| | | | | | | |
Collapse
|
16
|
DuBay KH, Bothma JP, Geissler PL. Long-range intra-protein communication can be transmitted by correlated side-chain fluctuations alone. PLoS Comput Biol 2011; 7:e1002168. [PMID: 21980271 PMCID: PMC3182858 DOI: 10.1371/journal.pcbi.1002168] [Citation(s) in RCA: 49] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2010] [Accepted: 07/05/2011] [Indexed: 11/30/2022] Open
Abstract
Allosteric regulation is a key component of cellular communication, but the way in which information is passed from one site to another within a folded protein is not often clear. While backbone motions have long been considered essential for long-range information conveyance, side-chain motions have rarely been considered. In this work, we demonstrate their potential utility using Monte Carlo sampling of side-chain torsional angles on a fixed backbone to quantify correlations amongst side-chain inter-rotameric motions. Results indicate that long-range correlations of side-chain fluctuations can arise independently from several different types of interactions: steric repulsions, implicit solvent interactions, or hydrogen bonding and salt-bridge interactions. These robust correlations persist across the entire protein (up to 60 Å in the case of calmodulin) and can propagate long-range changes in side-chain variability in response to single residue perturbations. Allosteric regulation occurs when the function of one part of a protein changes in response to a signal recognized by another part of the protein. Such intra-protein communication is essential for many biochemical processes, allowing the cell to adapt its behavior to a dynamic environment. Most studies of the information conveyance underlying allostery have to date focused on the role of backbone motions in mediating large structural changes. Here we focus instead on more subtle contributions, arising from fluctuations of side-chain torsions. Using a model for side-chain bond rotations in the tightly packed environment imposed by native backbone conformations, we observed significant sensitivity of side-chain organization to small, localized perturbations. This susceptibility arises from correlations among side-chain motions that can propagate information within a protein in complex, heterogeneous ways. Specifically, we found appreciable correlations even between side-chains distant from one another, so that the effect of a minor perturbation at one site on the protein could be observed in the altered fluctuations of side-chains throughout the protein. In conclusion, we have demonstrated that the statistical mechanics of correlated side-chain fluctuations within a model of the folded protein provides the basis for an unconventional but potentially important means of allostery.
Collapse
Affiliation(s)
- Kateri H. DuBay
- Department of Chemistry, University of California at Berkeley, Berkeley, California, United States of America
- Chemical Sciences, Physical Biosciences, and Materials Sciences Divisions, Lawrence Berkeley National Lab, Berkeley, California, United States of America
| | - Jacques P. Bothma
- Biophysical Graduate Group, University of California at Berkeley, Berkeley, California, United States of America
| | - Phillip L. Geissler
- Department of Chemistry, University of California at Berkeley, Berkeley, California, United States of America
- Chemical Sciences, Physical Biosciences, and Materials Sciences Divisions, Lawrence Berkeley National Lab, Berkeley, California, United States of America
- Biophysical Graduate Group, University of California at Berkeley, Berkeley, California, United States of America
- * E-mail:
| |
Collapse
|
17
|
Abstract
The development of peptides with therapeutic activities can be based on naturally occurring peptides or alternatively on de novo design. The discovery of natural peptides is often a matter of serendipity. In part, this is because natural peptides are typically proteolytically cleaved out from precursor proteins, a feature that averts the direct benefits of the genomic revolution. The first part of this review describes attempts to create a more systematic identification of natural peptides relying on a two step process. In the initial step, an in silico peptidome is predicted through the use of machine learning. Then, various computational biology tools are tailored to focus on peptides predicted to have the desired biological activity; for example, activating a GPCR or modulating the cellular arm of the immune system. The second part of the review is devoted to de novo peptide design and focuses on arguably the simplest scenario in which the designed peptide corresponds to a contiguous protein subsequence. Amongst these peptides, those corresponding to helical segments are prominent, mainly due to their relative ability to fold independently. Inspired by the clinical success of viral entry inhibitors, which are peptides corresponding to helical segments in viral envelope proteins, a computational tool for the identification of intramolecular helix-helix interactions was developed. Using this approach, peptides having anti-cancer, anti-angiogenic, and anti-inflammatory activities have been recently rationally designed and biologically characterized.
Collapse
Affiliation(s)
- Yossef Kliger
- Compugen LTD, 72 Pinchas Rosen, Tel Aviv 69512, Israel.
| |
Collapse
|
18
|
Ramachandran S, Vogel L, Strahl BD, Dokholyan NV. Thermodynamic stability of histone H3 is a necessary but not sufficient driving force for its evolutionary conservation. PLoS Comput Biol 2011; 7:e1001042. [PMID: 21253558 PMCID: PMC3017104 DOI: 10.1371/journal.pcbi.1001042] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2010] [Accepted: 11/29/2010] [Indexed: 11/30/2022] Open
Abstract
Determining the forces that conserve amino acid positions in proteins across species is a fundamental pursuit of molecular evolution. Evolutionary conservation is driven by either a protein's function or its thermodynamic stability. Highly conserved histone proteins offer a platform to evaluate these driving forces. While the conservation of histone H3 and H4 “tail” domains and surface residues are driven by functional importance, the driving force behind the conservation of buried histone residues has not been examined. Using a computational approach, we determined the thermodynamically preferred amino acids at each buried position in H3 and H4. In agreement with what is normally observed in proteins, we find a significant correlation between thermodynamic stability and evolutionary conservation in the buried residues in H4. In striking contrast, we find that thermodynamic stability of buried H3 residues does not correlate with evolutionary conservation. Given that these H3 residues are not post-translationally modified and only regulate H3-H3 and H3-H4 stabilizing interactions, our data imply an unknown function responsible for driving conservation of these buried H3 residues. Most proteins fold to a well-defined, three-dimensional structure, which can be delineated into the protein surface and its buried core. When comparing amino acid sequences of the same protein from different organisms, we would expect to find certain residue positions conserved due to the importance of that position in either maintaining the protein's function or its three-dimensional structure. In this study, we looked at residues in the buried core domains of histone proteins H3 and H4, which have no known function other than maintaining the three-dimensional structure of the protein. We find that perturbing protein stability (which is a measure of maintenance of the protein's structure) by mutating these residues compromises survival fitness in yeast. However, the stability conferred by buried amino acids of H3 alone cannot account for their evolutionary conservation, which is in striking contrast to other proteins where stability has been shown to be the driving force for sequence conservation. This conservation of H3 thus points to either new additional functions of H3 that have not been uncovered or a unique conservation mechanism that goes beyond survival pressure. These data therefore reveal a highly conserved domain that is distinct in its evolutionary conservation.
Collapse
Affiliation(s)
- Srinivas Ramachandran
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Program in Molecular and Cellular Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Lisa Vogel
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Brian D. Strahl
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail: (NVD); (BDS)
| | - Nikolay V. Dokholyan
- Department of Biochemistry and Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- Program in Molecular and Cellular Biophysics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
- * E-mail: (NVD); (BDS)
| |
Collapse
|
19
|
Csanády L, Vergani P, Gulyás-Kovács A, Gadsby DC. Electrophysiological, biochemical, and bioinformatic methods for studying CFTR channel gating and its regulation. Methods Mol Biol 2011; 741:443-469. [PMID: 21594801 DOI: 10.1007/978-1-61779-117-8_28] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
CFTR is the only member of the ABC (ATP-binding cassette) protein superfamily known to function as an ion channel. Most other ABC proteins are ATP-driven transporters, in which a cycle of ATP binding and hydrolysis, at intracellular nucleotide binding domains (NBDs), powers uphill substrate translocation across the membrane. In CFTR, this same ATP-driven cycle opens and closes a transmembrane pore through which chloride ions flow rapidly down their electrochemical gradient. Detailed analysis of the pattern of gating of CFTR channels thus offers the opportunity to learn about mechanisms of function not only of CFTR channels but also of their ABC transporter ancestors. In addition, CFTR channel gating is subject to complex regulation by kinase-mediated phosphorylation at multiple consensus sites in a cytoplasmic regulatory domain that is unique to CFTR. Here we offer a practical guide to extract useful information about the mechanisms that control opening and closing of CFTR channels: on how to plan (including information obtained from analysis of multiple sequence alignments), carry out, and analyze electrophysiological and biochemical experiments, as well as on how to circumvent potential pitfalls.
Collapse
Affiliation(s)
- László Csanády
- Department of Medical Biochemistry, Semmelweis University, Budapest, Hungary.
| | | | | | | |
Collapse
|
20
|
van Dijk ADJ, van Ham RCHJ. Conserved and variable correlated mutations in the plant MADS protein network. BMC Genomics 2010; 11:607. [PMID: 20979667 PMCID: PMC3017862 DOI: 10.1186/1471-2164-11-607] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2010] [Accepted: 10/28/2010] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Plant MADS domain proteins are involved in a variety of developmental processes for which their ability to form various interactions is a key requisite. However, not much is known about the structure of these proteins or their complexes, whereas such knowledge would be valuable for a better understanding of their function. Here, we analyze those proteins and the complexes they form using a correlated mutation approach in combination with available structural, bioinformatics and experimental data. RESULTS Correlated mutations are affected by several types of noise, which is difficult to disentangle from the real signal. In our analysis of the MADS domain proteins, we apply for the first time a correlated mutation analysis to a family of interacting proteins. This provides a unique way to investigate the amount of signal that is present in correlated mutations because it allows direct comparison of mutations in various family members and assessing their conservation. We show that correlated mutations in general are conserved within the various family members, and if not, the variability at the respective positions is less in the proteins in which the correlated mutation does not occur. Also, intermolecular correlated mutation signals for interacting pairs of proteins display clear overlap with other bioinformatics data, which is not the case for non-interacting protein pairs, an observation which validates the intermolecular correlated mutations. Having validated the correlated mutation results, we apply them to infer the structural organization of the MADS domain proteins. CONCLUSION Our analysis enables understanding of the structural organization of the MADS domain proteins, including support for predicted helices based on correlated mutation patterns, and evidence for a specific interaction site in those proteins.
Collapse
Affiliation(s)
- Aalt DJ van Dijk
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| | - Roeland CHJ van Ham
- Applied Bioinformatics, PRI, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands
| |
Collapse
|
21
|
Kowarsch A, Fuchs A, Frishman D, Pagel P. Correlated mutations: a hallmark of phenotypic amino acid substitutions. PLoS Comput Biol 2010; 6. [PMID: 20862353 PMCID: PMC2940720 DOI: 10.1371/journal.pcbi.1000923] [Citation(s) in RCA: 53] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2009] [Accepted: 08/09/2010] [Indexed: 11/18/2022] Open
Abstract
Point mutations resulting in the substitution of a single amino acid can cause severe functional consequences, but can also be completely harmless. Understanding what determines the phenotypical impact is important both for planning targeted mutation experiments in the laboratory and for analyzing naturally occurring mutations found in patients. Common wisdom suggests using the extent of evolutionary conservation of a residue or a sequence motif as an indicator of its functional importance and thus vulnerability in case of mutation. In this work, we put forward the hypothesis that in addition to conservation, co-evolution of residues in a protein influences the likelihood of a residue to be functionally important and thus associated with disease. While the basic idea of a relation between co-evolution and functional sites has been explored before, we have conducted the first systematic and comprehensive analysis of point mutations causing disease in humans with respect to correlated mutations. We included 14,211 distinct positions with known disease-causing point mutations in 1,153 human proteins in our analysis. Our data show that (1) correlated positions are significantly more likely to be disease-associated than expected by chance, and that (2) this signal cannot be explained by conservation patterns of individual sequence positions. Although correlated residues have primarily been used to predict contact sites, our data are in agreement with previous observations that (3) many such correlations do not relate to physical contacts between amino acid residues. Access to our analysis results are provided at http://webclu.bio.wzw.tum.de/~pagel/supplements/correlated-positions/. Point mutations (i.e., changes of a single sequence element) can have a severe impact on protein function. Many diseases are caused by such minute defects. On the other hand, the majority of such mutations does not lead to noticeable effects. Although previous research has revealed important aspects that influence or predict the chance of a mutation to cause disease, much remains to be learned before we fully understand this complex problem. In our work, we use the observation that sometimes certain positions in a protein mutate in an apparently correlated fashion and analyze this correlation with respect to mutation vulnerability. Our results show that positions exhibiting evolutionary correlation are significantly more likely to be vulnerable to mutation than average positions. On one hand, our data further support the concept of correlated positions to not only be associated with protein contacts but also functional sites and/or disease positions (as introduced by others). On the other hand, this could be useful to further improve the understanding and prediction of the consequences of mutations. Our work is the first to attempt a large-scale quantitation of this relationship.
Collapse
Affiliation(s)
- Andreas Kowarsch
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Angelika Fuchs
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
| | - Dmitrij Frishman
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
| | - Philipp Pagel
- Lehrstuhl für Genomorientierte Bioinformatik, Technische Universität München, Wissenschaftszentrum Weihenstephan, Freising, Germany
- Institut für Bioinformatik und Systembiologie/MIPS, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt, Neuherberg, Germany
- * E-mail:
| |
Collapse
|
22
|
Liu Y, Gierasch LM, Bahar I. Role of Hsp70 ATPase domain intrinsic dynamics and sequence evolution in enabling its functional interactions with NEFs. PLoS Comput Biol 2010; 6. [PMID: 20862304 PMCID: PMC2940730 DOI: 10.1371/journal.pcbi.1000931] [Citation(s) in RCA: 51] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2010] [Accepted: 08/16/2010] [Indexed: 12/16/2022] Open
Abstract
Catalysis of ADP-ATP exchange by nucleotide exchange factors (NEFs) is central to the activity of Hsp70 molecular chaperones. Yet, the mechanism of interaction of this family of chaperones with NEFs is not well understood in the context of the sequence evolution and structural dynamics of Hsp70 ATPase domains. We studied the interactions of Hsp70 ATPase domains with four different NEFs on the basis of the evolutionary trace and co-evolution of the ATPase domain sequence, combined with elastic network modeling of the collective dynamics of the complexes. Our study reveals a subtle balance between the intrinsic (to the ATPase domain) and specific (to interactions with NEFs) mechanisms shared by the four complexes. Two classes of key residues are distinguished in the Hsp70 ATPase domain: (i) highly conserved residues, involved in nucleotide binding, which mediate, via a global hinge-bending, the ATPase domain opening irrespective of NEF binding, and (ii) not-conserved but co-evolved and highly mobile residues, engaged in specific interactions with NEFs (e.g., N57, R258, R262, E283, D285). The observed interplay between these respective intrinsic (pre-existing, structure-encoded) and specific (co-evolved, sequence-dependent) interactions provides us with insights into the allosteric dynamics and functional evolution of the modular Hsp70 ATPase domain. The heat shock protein 70 (Hsp70) serves as a housekeeper in the cell, assisting in the correct folding, trafficking, and degradation of many proteins. The ATPase domain is the control unit of this molecular machine and its efficient functioning requires interactions with co-chaperones, including, in particular, the nucleotide exchange factors (NEFs). We examined the molecular motions of the ATPase domain in both NEF-bound and -unbound forms. We found that the NEF-binding surface enjoys large global movements prior to NEF binding, which presumably facilitates NEF recognition and binding. NEF binding stabilizes the ATPase domain in an open form and thereby facilitates the nucleotide exchange step of the chaperone cycle. A series of highly correlated amino acids were distinguished at the NEF-binding sites of the Hsp70 ATPase domain, which highlights the adaptability of the ATPase domain, both structurally and sequentially, to recognize NEFs. In contrast, the nucleotide-binding residues are tightly held near a global hinge center and are highly conserved. The contrasting properties of these two groups of residues point to an evolutionarily optimized balance between conserved/constrained and co-evolved/mobile amino acids, which enables the functional interactions of the modular Hps70 ATPase domains with NEFs.
Collapse
Affiliation(s)
- Ying Liu
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| | - Lila M. Gierasch
- Department of Biochemistry and Molecular Biology, and Department of Chemistry, University of Massachusetts Amherst, Amherst, Massachusetts, United States of America
| | - Ivet Bahar
- Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| |
Collapse
|
23
|
Barash D, Churkin A. Mutational analysis in RNAs: comparing programs for RNA deleterious mutation prediction. Brief Bioinform 2010; 12:104-14. [DOI: 10.1093/bib/bbq059] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
|
24
|
Xu Y, Tillier ERM. Regional covariation and its application for predicting protein contact patches. Proteins 2010; 78:548-58. [PMID: 19768681 DOI: 10.1002/prot.22576] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
Correlated mutation analysis (CMA) is an effective approach for predicting functional and structural residue interactions from multiple sequence alignments (MSAs) of proteins. As nearby residues may also play a role in a given functional interaction, we were interested in seeing whether covarying sites were clustered, and whether this could be used to enhance the predictive power of CMA. A large-scale search for coevolving regions within protein domains revealed that if two sites in a MSA covary, then neighboring sites in the alignment also typically covary, resulting in clusters of covarying residues. The program PatchD(http://www.uhnres.utoronto.ca/labs/tillier/) was developed to measure the covariation between disconnected sequence clusters to reveal patch covariation. Patches that exhibit strong covariation identify multiple residues that are generally nearby in the protein structure, suggesting that the detection of covarying patches can be used in conjunction with traditional CMA approaches to reveal functional interaction partners.
Collapse
Affiliation(s)
- Yongbai Xu
- Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | | |
Collapse
|
25
|
Ashkenazy H, Kliger Y. Reducing phylogenetic bias in correlated mutation analysis. Protein Eng Des Sel 2010; 23:321-6. [PMID: 20067922 DOI: 10.1093/protein/gzp078] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Correlated mutation analysis (CMA) is a sequence-based approach for ab initio protein contact map prediction. The basis of this approach is the observed correlation between mutations in interacting amino acid residues. These correlations are often estimated by either calculating the Pearson's correlation coefficient (PCC) or the mutual information (MI) between columns in a multiple sequence alignment (MSA) of the protein of interest and its homologs. A major challenge of CMA is to filter out the background noise originating from phylogenetic relatedness between sequences included in the MSA. Recently, a procedure to reduce this background noise was demonstrated to improve an MI-based predictor. Herein, we tested whether a similar approach can also improve the performance of the classical PCC-based method. Indeed, performance improvements were achieved for all four major SCOP classes. Furthermore, the results reveal that the improved PCC-based method is superior to MI-based methods for proteins having MSAs of up to 100 sequences.
Collapse
|
26
|
Noivirt-Brik O, Horovitz A, Unger R. Trade-off between positive and negative design of protein stability: from lattice models to real proteins. PLoS Comput Biol 2009; 5:e1000592. [PMID: 20011105 PMCID: PMC2781108 DOI: 10.1371/journal.pcbi.1000592] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2009] [Accepted: 11/03/2009] [Indexed: 11/18/2022] Open
Abstract
Two different strategies for stabilizing proteins are (i) positive design in which the native state is stabilized and (ii) negative design in which competing non-native conformations are destabilized. Here, the circumstances under which one strategy might be favored over the other are explored in the case of lattice models of proteins and then generalized and discussed with regard to real proteins. The balance between positive and negative design of proteins is found to be determined by their average "contact-frequency", a property that corresponds to the fraction of states in the conformational ensemble of the sequence in which a pair of residues is in contact. Lattice model proteins with a high average contact-frequency are found to use negative design more than model proteins with a low average contact-frequency. A mathematical derivation of this result indicates that it is general and likely to hold also for real proteins. Comparison of the results of correlated mutation analysis for real proteins with typical contact-frequencies to those of proteins likely to have high contact-frequencies (such as disordered proteins and proteins that are dependent on chaperonins for their folding) indicates that the latter tend to have stronger interactions between residues that are not in contact in their native conformation. Hence, our work indicates that negative design is employed when insufficient stabilization is achieved via positive design owing to high contact-frequencies.
Collapse
Affiliation(s)
- Orly Noivirt-Brik
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
| | - Amnon Horovitz
- Department of Structural Biology, Weizmann Institute of Science, Rehovot, Israel
- * E-mail:
| | - Ron Unger
- The Mina and Everard Goodman Faculty of Life Sciences, Bar-Ilan University, Ramat-Gan, Israel
| |
Collapse
|
27
|
Lee BC, Kim D. A new method for revealing correlated mutations under the structural and functional constraints in proteins. ACTA ACUST UNITED AC 2009; 25:2506-13. [PMID: 19628501 DOI: 10.1093/bioinformatics/btp455] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
MOTIVATION Diverse studies have shown that correlated mutation (CM) is an important molecular evolutionary process alongside conservation. However, attempts to find the residue pairs that co-evolve under the structural and/or functional constraints are complicated by the fact that a large portion of covariance signals found in multiple sequence alignments arise from correlations due to common ancestry and stochastic noise. RESULTS Assuming that the background noise can be estimated from the coevolutionary relationships among residues, we propose a new measure for background noise called the normalized coevolutionary pattern similarity (NCPS) score. By subtracting NCPS scores from raw CM scores and combining the results with an entropy factor, we show that these new scores effectively reduce the background noise. To test the effectiveness of this method in detecting residue pairs coevolving under the structural constraints, two independent test sets were performed, showing that this new method performs better than the most accurate method currently available. In addition, we also applied our method to double mutant cycle experiments and protein-protein interactions. Although more rigorous tests are required, we obtained promising results that our method tended to explain those data better than other methods. These results suggest that the new noise-reduced CM scores developed in this study can be a valuable tool for the study of correlated mutations under the structural and/or functional constraints in proteins. AVAILABILITY http://pbil.kaist.ac.kr
Collapse
Affiliation(s)
- Byung-Chul Lee
- Department of Bio and Brain Engineering, KAIST, Daejeon 305-701, Korea
| | | |
Collapse
|
28
|
Frenkel-Morgenstern M, Tworowski D, Klipcan L, Safro M. Intra-protein compensatory mutations analysis highlights the tRNA recognition regions in aminoacyl-tRNA synthetases. J Biomol Struct Dyn 2009; 27:115-26. [PMID: 19583438 DOI: 10.1080/07391102.2009.10507302] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Abstract
The aminoacyl-tRNA synthetases (aaRSs) covalently attach amino acids to their corresponding nucleic acid adapter molecules, tRNAs. The interactions in the tRNA-aaRSs complexes are mostly non-specific, and largely electrostatic. Tracing a way of aaRS-tRNA mutual adaptation throughout evolution offers a clearer view of understanding how aaRS-tRNA systems preserve patterns of tRNA recognition and binding. In this study, we used the compensatory mutations analysis to explore adaptation of aaRSs in respond to random mutations that can occur in the tRNA-recognition area. We showed that the frequency of compensatory mutations among residues that belong to the recognition region is 1.75-fold higher than that of the exposed residues. The highest frequencies of compensatory mutations are observed for pairs of charged residues, wherein one residue is located within the tRNA-recognition area, while the second is placed outside of the area, and contributes to the formation of the aaRS electrostatic landscape. Given charged residues are compensated by buried charge residues in more than 60% of the analyzed mutations. The cytoplasmatic and mitochondrial aaRSs preserve similar patterns of compensatory mutations in the tRNA recognition areas. Moreover, we found that mitochondrial aaRSs demonstrate a significant increase in the frequency of compensatory mutations in the area. Our findings shed light on the physical nature of compensatory mutations in aaRSs, thereby keeping unchanged tRNA-recognition patterns.
Collapse
|
29
|
Xu F, Du P, Shen H, Hu H, Wu Q, Xie J, Yu L. Correlated mutation analysis on the catalytic domains of serine/threonine protein kinases. PLoS One 2009; 4:e5913. [PMID: 19526051 PMCID: PMC2690836 DOI: 10.1371/journal.pone.0005913] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2009] [Accepted: 05/11/2009] [Indexed: 01/15/2023] Open
Abstract
Background Protein kinases (PKs) have emerged as the largest family of signaling proteins in eukaryotic cells and are involved in every aspect of cellular regulation. Great progresses have been made in understanding the mechanisms of PKs phosphorylating their substrates, but the detailed mechanisms, by which PKs ensure their substrate specificity with their structurally conserved catalytic domains, still have not been adequately understood. Correlated mutation analysis based on large sets of diverse sequence data may provide new insights into this question. Methodology/Principal Findings Statistical coupling, residue correlation and mutual information analyses along with clustering were applied to analyze the structure-based multiple sequence alignment of the catalytic domains of the Ser/Thr PK family. Two clusters of highly coupled sites were identified. Mapping these positions onto the 3D structure of PK catalytic domain showed that these two groups of positions form two physically close networks. We named these two networks as θ-shaped and γ-shaped networks, respectively. Conclusions/Significance The θ-shaped network links the active site cleft and the substrate binding regions, and might participate in PKs recognizing and interacting with their substrates. The γ-shaped network is mainly situated in one side of substrate binding regions, linking the activation loop and the substrate binding regions. It might play a role in supporting the activation loop and substrate binding regions before catalysis, and participate in product releasing after phosphoryl transfer. Our results exhibit significant correlations with experimental observations, and can be used as a guide to further experimental and theoretical studies on the mechanisms of PKs interacting with their substrates.
Collapse
Affiliation(s)
- Feng Xu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
- * E-mail: (FX); (LY)
| | - Pan Du
- Biomedical Informatics Center, Northwestern University, Chicago, Illinois, United States of America
| | - Hongbo Shen
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Hairong Hu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Qi Wu
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Jun Xie
- State Key Laboratory of Genetic Engineering, Institute of Genetics, School of Life Sciences, Fudan University, Shanghai, China
| | - Long Yu
- Institute of Biomedical Sciences, Fudan University, Shanghai, China
- * E-mail: (FX); (LY)
| |
Collapse
|
30
|
Qiu P, Sanfiorenzo V, Curry S, Guo Z, Liu S, Skelton A, Xia E, Cullen C, Ralston R, Greene J, Tong X. Identification of HCV protease inhibitor resistance mutations by selection pressure-based method. Nucleic Acids Res 2009; 37:e74. [PMID: 19395595 PMCID: PMC2691846 DOI: 10.1093/nar/gkp251] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Abstract
A major challenge to successful antiviral therapy is the emergence of drug-resistant viruses. Recent studies have developed several automated analyses of HIV sequence polymorphism based on calculations of selection pressure (Ka/Ks) to predict drug resistance mutations. Similar resistance analysis programs for HCV inhibitors are not currently available. Taking advantage of the recently available sequence data of patient HCV samples from a Phase II clinical study of protease inhibitor boceprevir, we calculated the selection pressure for all codons in the HCV protease region (amino acid 1–181) to identify potential resistance mutations. The correlation between mutations was also calculated to evaluate linkage between any two mutations. Using this approach, we identified previously known major resistant mutations, including a recently reported mutation V55A. In addition, a novel mutation V158I was identified, and we further confirmed its resistance to boceprevir in protease enzyme and replicon assay. We also extended the approach to analyze potential interactions between individual mutations and identified three pairs of correlated changes. Our data suggests that selection pressure-based analysis and correlation mapping could provide useful tools to analyze large amount of sequencing data from clinical samples and to identify new drug resistance mutations as well as their linkage and correlations.
Collapse
Affiliation(s)
- Ping Qiu
- Molecular Design and Informatics, Schering-Plough Research Institute, 2015 Galloping Hill Road, Kenilworth, NJ 07033, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Fatakia SN, Costanzi S, Chow CC. Computing highly correlated positions using mutual information and graph theory for G protein-coupled receptors. PLoS One 2009; 4:e4681. [PMID: 19262747 PMCID: PMC2650788 DOI: 10.1371/journal.pone.0004681] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2008] [Accepted: 01/07/2009] [Indexed: 01/06/2023] Open
Abstract
G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families.
Collapse
Affiliation(s)
- Sarosh N. Fatakia
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Stefano Costanzi
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Carson C. Chow
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
32
|
Ashkenazy H, Unger R, Kliger Y. Optimal data collection for correlated mutation analysis. Proteins 2009; 74:545-55. [PMID: 18655065 DOI: 10.1002/prot.22168] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The main objective of correlated mutation analysis (CMA) is to predict intraprotein residue-residue interactions from sequence alone. Despite considerable progress in algorithms and computer capabilities, the performance of CMA methods remains quite low. Here we examine whether, and to what extent, the quality of CMA methods depends on the sequences that are included in the multiple sequence alignment (MSA). The results revealed a strong correlation between the number of homologs in an MSA and CMA prediction strength. Furthermore, many of the current methods include only orthologs in the MSA, we found that it is beneficial to include both orthologs and paralogs in the MSA. Remarkably, even remote homologs contribute to the improved accuracy. Based on our findings we put forward an automated data collection procedure, with a minimal coverage of 50% between the query protein and its orthologs and paralogs. This procedure improves accuracy even in the absence of manual curation. In this era of massive sequencing and exploding sequence data, our results suggest that correlated mutation-based methods have not reached their inherent performance limitations and that the role of CMA in structural biology is far from being fulfilled.
Collapse
|
33
|
Noivirt-Brik O, Unger R, Horovitz A. Analysing the origin of long-range interactions in proteins using lattice models. BMC STRUCTURAL BIOLOGY 2009; 9:4. [PMID: 19178726 PMCID: PMC2670300 DOI: 10.1186/1472-6807-9-4] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/18/2008] [Accepted: 01/29/2009] [Indexed: 11/10/2022]
Abstract
BACKGROUND Long-range communication is very common in proteins but the physical basis of this phenomenon remains unclear. In order to gain insight into this problem, we decided to explore whether long-range interactions exist in lattice models of proteins. Lattice models of proteins have proven to capture some of the basic properties of real proteins and, thus, can be used for elucidating general principles of protein stability and folding. RESULTS Using a computational version of double-mutant cycle analysis, we show that long-range interactions emerge in lattice models even though they are not an input feature of them. The coupling energy of both short- and long-range pairwise interactions is found to become more positive (destabilizing) in a linear fashion with increasing 'contact-frequency', an entropic term that corresponds to the fraction of states in the conformational ensemble of the sequence in which the pair of residues is in contact. A mathematical derivation of the linear dependence of the coupling energy on 'contact-frequency' is provided. CONCLUSION Our work shows how 'contact-frequency' should be taken into account in attempts to stabilize proteins by introducing (or stabilizing) contacts in the native state and/or through 'negative design' of non-native contacts.
Collapse
Affiliation(s)
- Orly Noivirt-Brik
- Department of Structural Biology, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | |
Collapse
|
34
|
Shen H, Xu F, Hu H, Wang F, Wu Q, Huang Q, Wang H. Coevolving residues of (β/α)8-barrel proteins play roles in stabilizing active site architecture and coordinating protein dynamics. J Struct Biol 2008; 164:281-92. [DOI: 10.1016/j.jsb.2008.09.003] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2008] [Revised: 08/31/2008] [Accepted: 09/04/2008] [Indexed: 11/16/2022]
|
35
|
Miller CS, Eisenberg D. Using inferred residue contacts to distinguish between correct and incorrect protein models. ACTA ACUST UNITED AC 2008; 24:1575-82. [PMID: 18511466 PMCID: PMC2638260 DOI: 10.1093/bioinformatics/btn248] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/17/2022]
Abstract
Motivation: The de novo prediction of 3D protein structure is enjoying a period of dramatic improvements. Often, a remaining difficulty is to select the model closest to the true structure from a group of low-energy candidates. To what extent can inter-residue contact predictions from multiple sequence alignments, information which is orthogonal to that used in most structure prediction algorithms, be used to identify those models most similar to the native protein structure? Results: We present a Bayesian inference procedure to identify residue pairs that are spatially proximal in a protein structure. The method takes as input a multiple sequence alignment, and outputs an accurate posterior probability of proximity for each residue pair. We exploit a recent metagenomic sequencing project to create large, diverse and informative multiple sequence alignments for a test set of 1656 known protein structures. The method infers spatially proximal residue pairs in this test set with good accuracy: top-ranked predictions achieve an average accuracy of 38% (for an average 21-fold improvement over random predictions) in cross-validation tests. Notably, the accuracy of predicted 3D models generated by a range of structure prediction algorithms strongly correlates with how well the models satisfy probable residue contacts inferred via our method. This correlation allows for confident rejection of incorrect structural models. Availability: An implementation of the method is freely available at http://www.doe-mbi.ucla.edu/services Contact:david@mbi.ucla.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Christopher S Miller
- UCLA-DOE Institute for Genomics & Proteomics, Molecular Biology Institute, Box 951570, UCLA, Los Angeles, CA 90095, USA
| | | |
Collapse
|
36
|
Sayar K, Uğur Ö, Liu T, Hilser VJ, Onaran O. Exploring allosteric coupling in the alpha-subunit of heterotrimeric G proteins using evolutionary and ensemble-based approaches. BMC STRUCTURAL BIOLOGY 2008; 8:23. [PMID: 18454845 PMCID: PMC2422842 DOI: 10.1186/1472-6807-8-23] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/04/2007] [Accepted: 05/02/2008] [Indexed: 11/24/2022]
Abstract
BACKGROUND Allosteric coupling, which can be defined as propagation of a perturbation at one region of the protein molecule (such as ligand binding) to distant sites in the same molecule, constitutes the most general mechanism of regulation of protein function. However, unlike molecular details of ligand binding, structural elements involved in allosteric effects are difficult to diagnose. Here, we identified allosteric linkages in the alpha-subunits of heterotrimeric G proteins, which were evolved to transmit membrane receptor signals by allosteric mechanisms, by using two different approaches that utilize fundamentally different and independent information. RESULTS We analyzed: 1) correlated mutations in the family of G protein alpha-subunits, and 2) cooperativity of the native state ensemble of the Galphai1 or transducin. The combination of these approaches not only recovered already-known details such as the switch regions that change conformation upon nucleotide exchange, and those regions that are involved in receptor, effector or Gbetagamma interactions (indicating that the predictions of the analyses can be viewed with a measure of confidence), but also predicted new sites that are potentially involved in allosteric communication in the Galpha protein. A summary of the new sites found in the present analysis, which were not apparent in crystallographic data, is given along with known functional and structural information. Implications of the results are discussed. CONCLUSION A set of residues and/or structural elements that are potentially involved in allosteric communication in Galpha is presented. This information can be used as a guide to structural, spectroscopic, mutational, and theoretical studies on the allosteric network in Galpha proteins, which will provide a better understanding of G protein-mediated signal transduction.
Collapse
Affiliation(s)
- Kemal Sayar
- Ankara University Faculty of Medicine, Department of Pharmacology and Clinical Pharmacology, Sıhhiye 06100, Ankara, Turkey
- Ankara University Faculty of Medicine, and Molecular Biology and Technology Research and Development Unit, Sıhhiye 06100, Ankara, Turkey
| | - Özlem Uğur
- Ankara University Faculty of Medicine, Department of Pharmacology and Clinical Pharmacology, Sıhhiye 06100, Ankara, Turkey
| | - Tong Liu
- Department of Biochemistry and Molecular Biology, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX, 77555-1068 USA
| | - Vincent J Hilser
- Department of Biochemistry and Molecular Biology, and Sealy Center for Structural Biology and Molecular Biophysics, University of Texas Medical Branch, Galveston, TX, 77555-1068 USA
| | - Ongun Onaran
- Ankara University Faculty of Medicine, Department of Pharmacology and Clinical Pharmacology, Sıhhiye 06100, Ankara, Turkey
- Ankara University Faculty of Medicine, and Molecular Biology and Technology Research and Development Unit, Sıhhiye 06100, Ankara, Turkey
| |
Collapse
|
37
|
Thomas J, Ramakrishnan N, Bailey-Kellogg C. Graphical models of residue coupling in protein families. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2008; 5:183-197. [PMID: 18451428 DOI: 10.1109/tcbb.2007.70225] [Citation(s) in RCA: 43] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/26/2023]
Abstract
Many statistical measures and algorithmic techniques have been proposed for studying residue coupling in protein families. Generally speaking, two residue positions are considered coupled if, in the sequence record, some of their amino acid type combinations are significantly more common than others. While the proposed approaches have proven useful in finding and describing coupling, a significant missing component is a formal probabilistic model that explicates and compactly represents the coupling, integrates information about sequence,structure, and function, and supports inferential procedures for analysis, diagnosis, and prediction.We present an approach to learning and using probabilistic graphical models of residue coupling. These models capture significant conservation and coupling constraints observable ina multiply-aligned set of sequences. Our approach can place a structural prior on considered couplings, so that all identified relationships have direct mechanistic explanations. It can also incorporate information about functional classes, and thereby learn a differential graphical model that distinguishes constraints common to all classes from those unique to individual classes. Such differential models separately account for class-specific conservation and family-wide coupling, two different sources of sequence covariation. They are then able to perform interpretable functional classification of new sequences, explaining classification decisions in terms of the underlying conservation and coupling constraints. We apply our approach in studies of both G protein-coupled receptors and PDZ domains, identifying and analyzing family-wide and class-specific constraints, and performing functional classification. The results demonstrate that graphical models of residue coupling provide a powerful tool for uncovering, representing, and utilizing significant sequence structure-function relationships in protein families.
Collapse
Affiliation(s)
- John Thomas
- Department of Computer Science, Dartmouth College, Sudikoff Laboratory, Hanover, NH 03755, USA.
| | | | | |
Collapse
|
38
|
Liu Y, Eyal E, Bahar I. Analysis of correlated mutations in HIV-1 protease using spectral clustering. ACTA ACUST UNITED AC 2008; 24:1243-50. [PMID: 18375964 PMCID: PMC2373918 DOI: 10.1093/bioinformatics/btn110] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Abstract
Motivation: The ability of human immunodeficiency virus-1 (HIV-1) protease to develop mutations that confer multi-drug resistance (MDR) has been a major obstacle in designing rational therapies against HIV. Resistance is usually imparted by a cooperative mechanism that can be elucidated by a covariance analysis of sequence data. Identification of such correlated substitutions of amino acids may be obscured by evolutionary noise. Results: HIV-1 protease sequences from patients subjected to different specific treatments (set 1), and from untreated patients (set 2) were subjected to sequence covariance analysis by evaluating the mutual information (MI) between all residue pairs. Spectral clustering of the resulting covariance matrices disclosed two distinctive clusters of correlated residues: the first, observed in set 1 but absent in set 2, contained residues involved in MDR acquisition; and the second, included those residues differentiated in the various HIV-1 protease subtypes, shortly referred to as the phylogenetic cluster. The MDR cluster occupies sites close to the central symmetry axis of the enzyme, which overlap with the global hinge region identified from coarse-grained normal-mode analysis of the enzyme structure. The phylogenetic cluster, on the other hand, occupies solvent-exposed and highly mobile regions. This study demonstrates (i) the possibility of distinguishing between the correlated substitutions resulting from neutral mutations and those induced by MDR upon appropriate clustering analysis of sequence covariance data and (ii) a connection between global dynamics and functional substitution of amino acids. Contact:bahar@ccbb.pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ying Liu
- Department of Computational Biology, School of Medicine, University of Pittsburgh, PA 15232, USA
| | | | | |
Collapse
|
39
|
Chi CN, Elfström L, Shi Y, Snäll T, Engström Å, Jemth P. Reassessing a sparse energetic network within a single protein domain. Proc Natl Acad Sci U S A 2008; 105:4679-84. [PMID: 18339805 PMCID: PMC2290805 DOI: 10.1073/pnas.0711732105] [Citation(s) in RCA: 80] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2007] [Indexed: 11/18/2022] Open
Abstract
Understanding the molecular principles that govern allosteric communication is an important goal in protein science. One way allostery could be transmitted is via sparse energetic networks of residues, and one such evolutionary conserved network was identified in the PDZ domain family of proteins by multiple sequence alignment [Lockless SW, Ranganathan R (1999) Science 286:295-299]. We have reassessed the energetic coupling of these residues by double mutant cycles together with ligand binding and stability experiments and found that coupling is not a special property of the coevolved network of residues in PDZ domains. The observed coupling for ligand binding is better explained by a distance relationship, where residues close in space are more likely to couple than distal residues. Our study demonstrates that statistical coupling from sequence analysis is not necessarily a reporter of energetic coupling and allostery.
Collapse
Affiliation(s)
- Celestine N. Chi
- *Department of Medical Biochemistry and Microbiology, Uppsala University Biomedical Centre, Box 582, SE-751 23 Uppsala, Sweden; and
| | - Lisa Elfström
- *Department of Medical Biochemistry and Microbiology, Uppsala University Biomedical Centre, Box 582, SE-751 23 Uppsala, Sweden; and
| | - Yao Shi
- *Department of Medical Biochemistry and Microbiology, Uppsala University Biomedical Centre, Box 582, SE-751 23 Uppsala, Sweden; and
| | - Tord Snäll
- Department of Ecology, Swedish University of Agricultural Sciences, P.O. Box 7044, SE-750 07 Uppsala, Sweden
| | - Åke Engström
- *Department of Medical Biochemistry and Microbiology, Uppsala University Biomedical Centre, Box 582, SE-751 23 Uppsala, Sweden; and
| | - Per Jemth
- *Department of Medical Biochemistry and Microbiology, Uppsala University Biomedical Centre, Box 582, SE-751 23 Uppsala, Sweden; and
| |
Collapse
|
40
|
Merkl R, Zwick M. H2r: identification of evolutionary important residues by means of an entropy based analysis of multiple sequence alignments. BMC Bioinformatics 2008; 9:151. [PMID: 18366663 PMCID: PMC2323388 DOI: 10.1186/1471-2105-9-151] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2007] [Accepted: 03/18/2008] [Indexed: 11/15/2022] Open
Abstract
Background A multiple sequence alignment (MSA) generated for a protein can be used to characterise residues by means of a statistical analysis of single columns. In addition to the examination of individual positions, the investigation of co-variation of amino acid frequencies offers insights into function and evolution of the protein and residues. Results We introduce conn(k), a novel parameter for the characterisation of individual residues. For each residue k, conn(k) is the number of most extreme signals of co-evolution. These signals were deduced from a normalised mutual information (MI) value U(k, l) computed for all pairs of residues k, l. We demonstrate that conn(k) is a more robust indicator than an individual MI-value for the prediction of residues most plausibly important for the evolution of a protein. This proposition was inferred by means of statistical methods. It was further confirmed by the analysis of several proteins. A server, which computes conn(k)-values is available at . Conclusion The algorithms H2r, which analyses MSAs and computes conn(k)-values, characterises a specific class of residues. In contrast to strictly conserved ones, these residues possess some flexibility in the composition of side chains. However, their allocation is sensibly balanced with several other positions, as indicated by conn(k).
Collapse
Affiliation(s)
- Rainer Merkl
- Institut für Biophysik und Physikalische Biochemie, Universität Regensburg, D-93040 Regensburg, Germany.
| | | |
Collapse
|
41
|
Fuchs A, Martin-Galiano AJ, Kalman M, Fleishman S, Ben-Tal N, Frishman D. Co-evolving residues in membrane proteins. Bioinformatics 2007; 23:3312-9. [DOI: 10.1093/bioinformatics/btm515] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
42
|
Bartlett GJ, Taylor WR. Using scores derived from statistical coupling analysis to distinguish correct and incorrect folds in de-novo protein structure prediction. Proteins 2007; 71:950-9. [DOI: 10.1002/prot.21779] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
|
43
|
Gouveia-Oliveira R, Pedersen AG. Finding coevolving amino acid residues using row and column weighting of mutual information and multi-dimensional amino acid representation. Algorithms Mol Biol 2007; 2:12. [PMID: 17915013 PMCID: PMC2234412 DOI: 10.1186/1748-7188-2-12] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2007] [Accepted: 10/03/2007] [Indexed: 11/10/2022] Open
Abstract
Background Some amino acid residues functionally interact with each other. This interaction will result in an evolutionary co-variation between these residues – coevolution. Our goal is to find these coevolving residues. Results We present six new methods for detecting coevolving residues. Among other things, we suggest measures that are variants of Mutual Information, and measures that use a multidimensional representation of each residue in order to capture the physico-chemical similarities between amino acids. We created a benchmarking system, in silico, able to evaluate these methods through a wide range of realistic conditions. Finally, we use the combination of different methods as a way of improving performance. Conclusion Our best method (Row and Column Weighed Mutual Information) has an estimated accuracy increase of 63% over Mutual Information. Furthermore, we show that the combination of different methods is efficient, and that the methods are quite sensitive to the different conditions tested.
Collapse
Affiliation(s)
- Rodrigo Gouveia-Oliveira
- Center for Biological sequence analysis, The Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| | - Anders G Pedersen
- Center for Biological sequence analysis, The Technical University of Denmark, Building 208, 2800 Lyngby, Denmark
| |
Collapse
|
44
|
Wang Q, Lee C. Distinguishing functional amino acid covariation from background linkage disequilibrium in HIV protease and reverse transcriptase. PLoS One 2007; 2:e814. [PMID: 17726544 PMCID: PMC1950573 DOI: 10.1371/journal.pone.0000814] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2007] [Accepted: 08/01/2007] [Indexed: 11/19/2022] Open
Abstract
Correlated amino acid mutation analysis has been widely used to infer functional interactions between different sites in a protein. However, this analysis can be confounded by important phylogenetic effects broadly classifiable as background linkage disequilibrium (BLD). We have systematically separated the covariation induced by selective interactions between amino acids from background LD, using synonymous (S) vs. amino acid (A) mutations. Covariation between two amino acid mutations, (A,A), can be affected by selective interactions between amino acids, whereas covariation within (A,S) pairs or (S,S) pairs cannot. Our analysis of the pol gene — including the protease and the reverse transcriptase genes — in HIV reveals that (A,A) covariation levels are enormously higher than for either (A,S) or (S,S), and thus cannot be attributed to phylogenetic effects. The magnitude of these effects suggests that a large portion of (A,A) covariation in the HIV pol gene results from selective interactions. Inspection of the most prominent (A,A) interactions in the HIV pol gene showed that they are known sites of independently identified drug resistance mutations, and physically cluster around the drug binding site. Moreover, the specific set of (A,A) interaction pairs was reproducible in different drug treatment studies, and vanished in untreated HIV samples. The (S,S) covariation curves measured a low but detectable level of background LD in HIV.
Collapse
Affiliation(s)
- Qi Wang
- Center for Computational Biology, Molecular Biology Institute, Institute for Genomics and Proteomics, University of California at Los Angeles, Los Angeles, United States of America
| | - Christopher Lee
- Center for Computational Biology, Molecular Biology Institute, Institute for Genomics and Proteomics, University of California at Los Angeles, Los Angeles, United States of America
- Department of Chemistry and Biochemistry, University of California at Los Angeles, Los Angeles, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
45
|
Carlson J, Kadie C, Mallal S, Heckerman D. Leveraging hierarchical population structure in discrete association studies. PLoS One 2007; 2:e591. [PMID: 17611623 PMCID: PMC1899226 DOI: 10.1371/journal.pone.0000591] [Citation(s) in RCA: 31] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2007] [Accepted: 06/08/2007] [Indexed: 11/22/2022] Open
Abstract
Population structure can confound the identification of correlations in biological data. Such confounding has been recognized in multiple biological disciplines, resulting in a disparate collection of proposed solutions. We examine several methods that correct for confounding on discrete data with hierarchical population structure and identify two distinct confounding processes, which we call coevolution and conditional influence. We describe these processes in terms of generative models and show that these generative models can be used to correct for the confounding effects. Finally, we apply the models to three applications: identification of escape mutations in HIV-1 in response to specific HLA-mediated immune pressure, prediction of coevolving residues in an HIV-1 peptide, and a search for genotypes that are associated with bacterial resistance traits in Arabidopsis thaliana. We show that coevolution is a better description of confounding in some applications and conditional influence is better in others. That is, we show that no single method is best for addressing all forms of confounding. Analysis tools based on these models are available on the internet as both web based applications and downloadable source code at http://atom.research.microsoft.com/bio/phylod.aspx.
Collapse
Affiliation(s)
- Jonathan Carlson
- Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, Washington, United States of America
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| | - Carl Kadie
- Machine Learning and Applied Statistics Group, Microsoft Research, Redmond, Washington, United States of America
| | - Simon Mallal
- Center for Clinical Immunology and Biomedical Statistics, Royal Perth Hospital, Perth, Australia
| | - David Heckerman
- Department of Computer Science and Engineering, University of Washington, Seattle, Washington, United States of America
| |
Collapse
|
46
|
Suemori A, Iwakura M. A Systematic and Comprehensive Combinatorial Approach to Simultaneously Improve the Activity, Reaction Specificity, and Thermal Stability of p-Hydroxybenzoate Hydroxylase. J Biol Chem 2007; 282:19969-78. [PMID: 17462997 DOI: 10.1074/jbc.m610320200] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
We have simultaneously improved the activity, reaction specificity, and thermal stability of p-hydroxybenzoate hydroxylase by means of systematic and comprehensive combinatorial mutagenesis starting from available single mutations. Introduction of random mutations at the positions of four cysteine and eight methionine residues provided 216 single mutants as stably expressed forms in Escherichia coli host cells. Four characteristics, hydroxylase activity toward p-hydroxybenzoate (main activity), protocatechuate-dependent NADPH oxidase activity (sub-activity), ratio of sub-activity to main activity (reaction specificity), and thermal stability, of the purified mutants were determined. To improve the above characteristics for diagnostic use of the enzyme, 11 single mutations (C152V, C211I, C332A, M52V, M52Q, M110L, M110I, M213G, M213L, M276Q, and M349A) were selected for further combinatorial mutagenesis. All possible combinations of the mutations provided 18 variants with double mutations and further combinatorial mutagenesis provided 6 variants with triple mutations and 9 variants with quadruple mutations with the simultaneously improved four properties.
Collapse
Affiliation(s)
- Akio Suemori
- National Institute of Advanced Industrial Science and Technology, 1-1-1 Higashi, Tsukuba, Ibaraki 305-8566, Japan
| | | |
Collapse
|
47
|
Eyal E, Frenkel-Morgenstern M, Sobolev V, Pietrokovski S. A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction. Proteins 2007; 67:142-53. [PMID: 17243158 DOI: 10.1002/prot.21223] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
We present a new structurally derived pair-to-pair substitution matrix (P2PMAT). This matrix is constructed from a very large amount of integrated high quality multiple sequence alignments (Blocks) and protein structures. It evaluates the likelihoods of all 160,000 pair-to-pair substitutions. P2PMAT matrix implicitly accounts for evolutionary conservation, correlated mutations, and residue-residue contact potentials. The usefulness of the matrix for structural predictions is shown in this article. Predicting protein residue-residue contacts from sequence information alone, by our method (P2PConPred) is particularly accurate in the protein cores, where it performs better than other basic contact prediction methods (increasing accuracy by 25-60%). The method mean accuracy for protein cores is 24% for 59 diverse families and 34% for a subset of proteins shorter than 100 residues. This is above the level that was recently shown to be sufficient to significantly improve ab initio protein structure prediction. We also demonstrate the ability of our approach to identify native structures within large sets of (300-2000) protein decoys. On the basis of evolutionary information alone our method ranks the native structure in the top 0.3% of the decoys in 4/10 of the sets, and in 8/10 of sets the native structure is ranked in the top 10% of the decoys. The method can, thus, be used to assist filtering wrong models, complementing traditional scoring functions.
Collapse
Affiliation(s)
- Eran Eyal
- Department of Plant Sciences, Weizmann Institute of Science, Rehovot 76100, Israel.
| | | | | | | |
Collapse
|
48
|
Berezovsky IN, Zeldovich KB, Shakhnovich EI. Positive and negative design in stability and thermal adaptation of natural proteins. PLoS Comput Biol 2007; 3:e52. [PMID: 17381236 PMCID: PMC1829478 DOI: 10.1371/journal.pcbi.0030052] [Citation(s) in RCA: 92] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2006] [Accepted: 01/31/2007] [Indexed: 11/18/2022] Open
Abstract
The aim of this work is to elucidate how physical principles of protein design are reflected in natural sequences that evolved in response to the thermal conditions of the environment. Using an exactly solvable lattice model, we design sequences with selected thermal properties. Compositional analysis of designed model sequences and natural proteomes reveals a specific trend in amino acid compositions in response to the requirement of stability at elevated environmental temperature: the increase of fractions of hydrophobic and charged amino acid residues at the expense of polar ones. We show that this “from both ends of the hydrophobicity scale” trend is due to positive (to stabilize the native state) and negative (to destabilize misfolded states) components of protein design. Negative design strengthens specific repulsive non-native interactions that appear in misfolded structures. A pressure to preserve specific repulsive interactions in non-native conformations may result in correlated mutations between amino acids that are far apart in the native state but may be in contact in misfolded conformations. Such correlated mutations are indeed found in TIM barrel and other proteins. What mechanisms does Nature use in her quest for thermophilic proteins? It is known that stability of a protein is mainly determined by the energy gap, or the difference in energy, between native state and a set of incorrectly folded (misfolded) conformations. Here we show that Nature makes thermophilic proteins by widening this gap from both ends. The energy of the native state of a protein is decreased by selecting strongly attractive amino acids at positions that are in contact in the native state (positive design). Simultaneously, energies of the misfolded conformations are increased by selection of strongly repulsive amino acids at positions that are distant in native structure; however, these amino acids will interact repulsively in the misfolded conformations (negative design). These fundamental principles of protein design are manifested in the “from both ends of the hydrophobicity scale” trend observed in thermophilic adaptation, whereby proteomes of thermophilic proteins are enriched in extreme amino acids—hydrophobic and charged—at the expense of polar ones. Hydrophobic amino acids contribute mostly to the positive design, while charged amino acids that repel each other in non-native conformations of proteins contribute to negative design. Our results provide guidance in rational design of proteins with selected thermal properties.
Collapse
Affiliation(s)
- Igor N Berezovsky
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Konstantin B Zeldovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
| | - Eugene I Shakhnovich
- Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts, United States of America
- * To whom correspondence should be addressed. E-mail:
| |
Collapse
|
49
|
Treynor TP, Vizcarra CL, Nedelcu D, Mayo SL. Computationally designed libraries of fluorescent proteins evaluated by preservation and diversity of function. Proc Natl Acad Sci U S A 2006; 104:48-53. [PMID: 17179210 PMCID: PMC1765474 DOI: 10.1073/pnas.0609647103] [Citation(s) in RCA: 70] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
To determine which of seven library design algorithms best introduces new protein function without destroying it altogether, seven combinatorial libraries of green fluorescent protein variants were designed and synthesized. Each was evaluated by distributions of emission intensity and color compiled from measurements made in vivo. Additional comparisons were made with a library constructed by error-prone PCR. Among the designed libraries, fluorescent function was preserved for the greatest fraction of samples in a library designed by using a structure-based computational method developed and described here. A trend was observed toward greater diversity of color in designed libraries that better preserved fluorescence. Contrary to trends observed among libraries constructed by error-prone PCR, preservation of function was observed to increase with a library's average mutation level among the four libraries designed with structure-based computational methods.
Collapse
Affiliation(s)
- Thomas P. Treynor
- Divisions of *Biology and Chemistry and
- Howard Hughes Medical Institute, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125
| | | | | | - Stephen L. Mayo
- Divisions of *Biology and Chemistry and
- Howard Hughes Medical Institute, California Institute of Technology, 1200 East California Boulevard, Pasadena, CA 91125
- To whom correspondence should be addressed. E-mail:
| |
Collapse
|
50
|
Chen L, Lee C. Distinguishing HIV-1 drug resistance, accessory, and viral fitness mutations using conditional selection pressure analysis of treated versus untreated patient samples. Biol Direct 2006; 1:14. [PMID: 16737543 PMCID: PMC1523337 DOI: 10.1186/1745-6150-1-14] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2006] [Accepted: 05/31/2006] [Indexed: 11/18/2022] Open
Abstract
BACKGROUND HIV can evolve drug resistance rapidly in response to new drug treatments, often through a combination of multiple mutations 123. It would be useful to develop automated analyses of HIV sequence polymorphism that are able to predict drug resistance mutations, and to distinguish different types of functional roles among such mutations, for example, those that directly cause drug resistance, versus those that play an accessory role. Detecting functional interactions between mutations is essential for this classification. We have adapted a well-known measure of evolutionary selection pressure (Ka/Ks) and developed a conditional Ka/Ks approach to detect important interactions. RESULTS We have applied this analysis to four independent HIV protease sequencing datasets: 50,000 clinical samples sequenced by Specialty Laboratories, Inc.; 1800 samples from patients treated with protease inhibitors; 2600 samples from untreated patients; 400 samples from untreated African patients. We have identified 428 mutation interactions in Specialty dataset with statistical significance and we were able to distinguish primary vs. accessory mutations for many well-studied examples. Amino acid interactions identified by conditional Ka/Ks matched 80 of 92 pair wise interactions found by a completely independent study of HIV protease (p-value for this match is significant: 10-70). Furthermore, Ka/Ks selection pressure results were highly reproducible among these independent datasets, both qualitatively and quantitatively, suggesting that they are detecting real drug-resistance and viral fitness mutations in the wild HIV-1 population. CONCLUSION Conditional Ka/Ks analysis can detect mutation interactions and distinguish primary vs. accessory mutations in HIV-1. Ka/Ks analysis of treated vs. untreated patient data can distinguish drug-resistance vs. viral fitness mutations. Verification of these results would require longitudinal studies. The result provides a valuable resource for AIDS research and will be available for open access upon publication at http://www.bioinformatics.ucla.edu/HIV.
Collapse
Affiliation(s)
- Lamei Chen
- Institute for Genomics & Proteomics, Molecular Biology Institute, Dept. of Chemistry & Biochemistry, UCLA, Los Angeles, CA 90095-1570, USA
| | - Christopher Lee
- Institute for Genomics & Proteomics, Molecular Biology Institute, Dept. of Chemistry & Biochemistry, UCLA, Los Angeles, CA 90095-1570, USA
| |
Collapse
|