1
|
Wagner JR, Lee CT, Durrant JD, Malmstrom RD, Feher VA, Amaro RE. Emerging Computational Methods for the Rational Discovery of Allosteric Drugs. Chem Rev 2016; 116:6370-90. [PMID: 27074285 PMCID: PMC4901368 DOI: 10.1021/acs.chemrev.5b00631] [Citation(s) in RCA: 148] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022]
Abstract
![]()
Allosteric drug development holds
promise for delivering medicines
that are more selective and less toxic than those that target orthosteric
sites. To date, the discovery of allosteric binding sites and lead
compounds has been mostly serendipitous, achieved through high-throughput
screening. Over the past decade, structural data has become more readily
available for larger protein systems and more membrane protein classes
(e.g., GPCRs and ion channels), which are common allosteric drug targets.
In parallel, improved simulation methods now provide better atomistic
understanding of the protein dynamics and cooperative motions that
are critical to allosteric mechanisms. As a result of these advances,
the field of predictive allosteric drug development is now on the
cusp of a new era of rational structure-based computational methods.
Here, we review algorithms that predict allosteric sites based on
sequence data and molecular dynamics simulations, describe tools that
assess the druggability of these pockets, and discuss how Markov state
models and topology analyses provide insight into the relationship
between protein dynamics and allosteric drug binding. In each section,
we first provide an overview of the various method classes before
describing relevant algorithms and software packages.
Collapse
Affiliation(s)
- Jeffrey R Wagner
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Christopher T Lee
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Jacob D Durrant
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Robert D Malmstrom
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Victoria A Feher
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| | - Rommie E Amaro
- Department of Chemistry & Biochemistry and ‡National Biomedical Computation Resource, University of California, San Diego , La Jolla, California 92093, United States
| |
Collapse
|
2
|
Gao H, Yu X, Dou Y, Wang J. New Measurement for Correlation of Co-evolution Relationship of Subsequences in Protein. Interdiscip Sci 2015; 7:364-72. [PMID: 26396121 DOI: 10.1007/s12539-015-0024-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 04/08/2014] [Accepted: 04/16/2014] [Indexed: 11/26/2022]
Abstract
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's correlation coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) are used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
- Information and Engineering College, Dalian University, Dalian, 116622, China
| | - Xiaoqing Yu
- College of Sciences, Shanghai Institute of Technology, Shanghai, 201418, China
| | - Yongchao Dou
- Center for Plant Science and Innovation, School of Biological Sciences, University of Nebraska, Lincoln, NE, 68588, USA
| | - Jun Wang
- Department of Mathematics, Shanghai Normal University, Shanghai, 200234, China.
| |
Collapse
|
3
|
Gao H, Yu X, Dou Y, Wang J. New measurement for correlation of co-evolution relationship of subsequences in protein. Interdiscip Sci 2015. [PMID: 25663109 DOI: 10.1007/s12539-014-0221-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2014] [Revised: 04/08/2014] [Accepted: 04/16/2014] [Indexed: 11/24/2022]
Abstract
Many computational tools have been developed to measure the protein residues co-evolution. Most of them only focus on co-evolution for pairwise residues in a protein sequence. However, number of residues participate in co-evolution might be multiple. And some co-evolved residues are clustered in several distinct regions in primary structure. Therefore, the co-evolution among the adjacent residues, and the correlation between the distinct regions offer insights into function and evolution of the protein and residues. Subsequence is used to represent the adjacent multiple residues in one distinct region. In the paper, co-evolution relationship in each subsequence is represented by mutual information matrix (MIM). Then, Pearson's Correlation Coefficient: R value is developed to measure the similarity correlation of two MIMs. MSAs from Catalytic Data Base (Catalytic Site Atlas, CSA) is used for testing. R value characterizes a specific class of residues. In contrast to individual pairwise co-evolved residues, adjacent residues without high individual MI values are found since the co-evolved relationship among them is similar to that among another set of adjacent residues. These subsequences possess some flexibility in the composition of side chains, such as the catalyzed environment.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, 116024, China
| | | | | | | |
Collapse
|
4
|
Janda JO, Popal A, Bauer J, Busch M, Klocke M, Spitzer W, Keller J, Merkl R. H2rs: deducing evolutionary and functionally important residue positions by means of an entropy and similarity based analysis of multiple sequence alignments. BMC Bioinformatics 2014; 15:118. [PMID: 24766829 PMCID: PMC4021312 DOI: 10.1186/1471-2105-15-118] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2014] [Accepted: 04/17/2014] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The identification of functionally important residue positions is an important task of computational biology. Methods of correlation analysis allow for the identification of pairs of residue positions, whose occupancy is mutually dependent due to constraints imposed by protein structure or function. A common measure assessing these dependencies is the mutual information, which is based on Shannon's information theory that utilizes probabilities only. Consequently, such approaches do not consider the similarity of residue pairs, which may degrade the algorithm's performance. One typical algorithm is H2r, which characterizes each individual residue position k by the conn(k)-value, which is the number of significantly correlated pairs it belongs to. RESULTS To improve specificity of H2r, we developed a revised algorithm, named H2rs, which is based on the von Neumann entropy (vNE). To compute the corresponding mutual information, a matrix A is required, which assesses the similarity of residue pairs. We determined A by deducing substitution frequencies from contacting residue pairs observed in the homologs of 35 809 proteins, whose structure is known. In analogy to H2r, the enhanced algorithm computes a normalized conn(k)-value. Within the framework of H2rs, only statistically significant vNE values were considered. To decide on significance, the algorithm calculates a p-value by performing a randomization test for each individual pair of residue positions. The analysis of a large in silico testbed demonstrated that specificity and precision were higher for H2rs than for H2r and two other methods of correlation analysis. The gain in prediction quality is further confirmed by a detailed assessment of five well-studied enzymes. The outcome of H2rs and of a method that predicts contacting residue positions (PSICOV) overlapped only marginally. H2rs can be downloaded from http://www-bioinf.uni-regensburg.de. CONCLUSIONS Considering substitution frequencies for residue pairs by means of the von Neumann entropy and a p-value improved the success rate in identifying important residue positions. The integration of proven statistical concepts and normalization allows for an easier comparison of results obtained with different proteins. Comparing the outcome of the local method H2rs and of the global method PSICOV indicates that such methods supplement each other and have different scopes of application.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | - Rainer Merkl
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, D-93040 Regensburg, Germany.
| |
Collapse
|
5
|
Pelé J, Moreau M, Abdi H, Rodien P, Castel H, Chabbert M. Comparative analysis of sequence covariation methods to mine evolutionary hubs: Examples from selected GPCR families. Proteins 2014; 82:2141-56. [DOI: 10.1002/prot.24570] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2013] [Revised: 03/11/2014] [Accepted: 03/19/2014] [Indexed: 01/26/2023]
Affiliation(s)
- Julien Pelé
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Matthieu Moreau
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| | - Hervé Abdi
- The University of Texas at Dallas; School of Behavioral and Brain Sciences; Richardson, TX 75080-3021 USA
| | - Patrice Rodien
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
- Department of Endocrinology, Reference Centre for the pathologies of hormonal receptivity; Centre Hospitalier Universitaire of Angers; 4 rue Larrey 49933 Angers France
| | - Hélène Castel
- INSERM U982, Laboratory of Neuronal and Neuroendocrine Communication and Differentiation, DC2N; University of Rouen; 76821 Mont-Saint-Aignan France
| | - Marie Chabbert
- UMR CNRS 6214-INSERM 1083, Laboratory of Integrated Neurovascular and Mitochondrial Biology; University of Angers; 49045 Angers France
| |
Collapse
|
6
|
Gültas M, Düzgün G, Herzog S, Jäger SJ, Meckbach C, Wingender E, Waack S. Quantum coupled mutation finder: predicting functionally or structurally important sites in proteins using quantum Jensen-Shannon divergence and CUDA programming. BMC Bioinformatics 2014; 15:96. [PMID: 24694117 PMCID: PMC4098773 DOI: 10.1186/1471-2105-15-96] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Accepted: 03/26/2014] [Indexed: 11/29/2022] Open
Abstract
Background The identification of functionally or structurally important non-conserved residue sites in protein MSAs is an important challenge for understanding the structural basis and molecular mechanism of protein functions. Despite the rich literature on compensatory mutations as well as sequence conservation analysis for the detection of those important residues, previous methods often rely on classical information-theoretic measures. However, these measures usually do not take into account dis/similarities of amino acids which are likely to be crucial for those residues. In this study, we present a new method, the Quantum Coupled Mutation Finder (QCMF) that incorporates significant dis/similar amino acid pair signals in the prediction of functionally or structurally important sites. Results The result of this study is twofold. First, using the essential sites of two human proteins, namely epidermal growth factor receptor (EGFR) and glucokinase (GCK), we tested the QCMF-method. The QCMF includes two metrics based on quantum Jensen-Shannon divergence to measure both sequence conservation and compensatory mutations. We found that the QCMF reaches an improved performance in identifying essential sites from MSAs of both proteins with a significantly higher Matthews correlation coefficient (MCC) value in comparison to previous methods. Second, using a data set of 153 proteins, we made a pairwise comparison between QCMF and three conventional methods. This comparison study strongly suggests that QCMF complements the conventional methods for the identification of correlated mutations in MSAs. Conclusions QCMF utilizes the notion of entanglement, which is a major resource of quantum information, to model significant dissimilar and similar amino acid pair signals in the detection of functionally or structurally important sites. Our results suggest that on the one hand QCMF significantly outperforms the previous method, which mainly focuses on dissimilar amino acid signals, to detect essential sites in proteins. On the other hand, it is complementary to the existing methods for the identification of correlated mutations. The method of QCMF is computationally intensive. To ensure a feasible computation time of the QCMF’s algorithm, we leveraged Compute Unified Device Architecture (CUDA). The QCMF server is freely accessible at http://qcmf.informatik.uni-goettingen.de/.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr, 7, 37077 Göttingen, Germany.
| | | | | | | | | | | | | |
Collapse
|
7
|
Gültas M, Haubrock M, Tüysüz N, Waack S. Coupled mutation finder: a new entropy-based method quantifying phylogenetic noise for the detection of compensatory mutations. BMC Bioinformatics 2012; 13:225. [PMID: 22963049 PMCID: PMC3577461 DOI: 10.1186/1471-2105-13-225] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2012] [Accepted: 08/23/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The detection of significant compensatory mutation signals in multiple sequence alignments (MSAs) is often complicated by noise. A challenging problem in bioinformatics is remains the separation of significant signals between two or more non-conserved residue sites from the phylogenetic noise and unrelated pair signals. Determination of these non-conserved residue sites is as important as the recognition of strictly conserved positions for understanding of the structural basis of protein functions and identification of functionally important residue regions. In this study, we developed a new method, the Coupled Mutation Finder (CMF) quantifying the phylogenetic noise for the detection of compensatory mutations. RESULTS To demonstrate the effectiveness of this method, we analyzed essential sites of two human proteins: epidermal growth factor receptor (EGFR) and glucokinase (GCK). Our results suggest that the CMF is able to separate significant compensatory mutation signals from the phylogenetic noise and unrelated pair signals. The vast majority of compensatory mutation sites found by the CMF are related to essential sites of both proteins and they are likely to affect protein stability or functionality. CONCLUSIONS The CMF is a new method, which includes an MSA-specific statistical model based on multiple testing procedures that quantify the error made in terms of the false discovery rate and a novel entropy-based metric to upscale BLOSUM62 dissimilar compensatory mutations. Therefore, it is a helpful tool to predict and investigate compensatory mutation sites of structural or functional importance in proteins. We suggest that the CMF could be used as a novel automated function prediction tool that is required for a better understanding of the structural basis of proteins. The CMF server is freely accessible at http://cmf.bioinf.med.uni-goettingen.de.
Collapse
Affiliation(s)
- Mehmet Gültas
- Institute of Computer Science, University of Göttingen, Goldschmidtstr. 7, Göttingen, 37077, Germany.
| | | | | | | |
Collapse
|
8
|
Dietrich S, Borst N, Schlee S, Schneider D, Janda JO, Sterner R, Merkl R. Experimental assessment of the importance of amino acid positions identified by an entropy-based correlation analysis of multiple-sequence alignments. Biochemistry 2012; 51:5633-41. [PMID: 22737967 DOI: 10.1021/bi300747r] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
The analysis of a multiple-sequence alignment (MSA) with correlation methods identifies pairs of residue positions whose occupation with amino acids changes in a concerted manner. It is plausible to assume that positions that are part of many such correlation pairs are important for protein function or stability. We have used the algorithm H2r to identify positions k in the MSAs of the enzymes anthranilate phosphoribosyl transferase (AnPRT) and indole-3-glycerol phosphate synthase (IGPS) that show a high conn(k) value, i.e., a large number of significant correlations in which k is involved. The importance of the identified residues was experimentally validated by performing mutagenesis studies with sAnPRT and sIGPS from the archaeon Sulfolobus solfataricus. For sAnPRT, five H2r mutant proteins were generated by replacing nonconserved residues with alanine or the prevalent residue of the MSA. As a control, five residues with conn(k) values of zero were chosen randomly and replaced with alanine. The catalytic activities and conformational stabilities of the H2r and control mutant proteins were analyzed by steady-state enzyme kinetics and thermal unfolding studies. Compared to wild-type sAnPRT, the catalytic efficiencies (k(cat)/K(M)) were largely unaltered. In contrast, the apparent thermal unfolding temperature (T(M)(app)) was lowered in most proteins. Remarkably, the strongest observed destabilization (ΔT(M)(app) = 14 °C) was caused by the V284A exchange, which pertains to the position with the highest correlation signal [conn(k) = 11]. For sIGPS, six H2r mutant and four control proteins with alanine exchanges were generated and characterized. The k(cat)/K(M) values of four H2r mutant proteins were reduced between 13- and 120-fold, and their T(M)(app) values were decreased by up to 5 °C. For the sIGPS control proteins, the observed activity and stability decreases were much less severe. Our findings demonstrate that positions with high conn(k) values have an increased probability of being important for enzyme function or stability.
Collapse
Affiliation(s)
- Susanne Dietrich
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, Universitätsstrasse 31, D-93053 Regensburg, Germany
| | | | | | | | | | | | | |
Collapse
|
9
|
Janda JO, Busch M, Kück F, Porfenenko M, Merkl R. CLIPS-1D: analysis of multiple sequence alignments to deduce for residue-positions a role in catalysis, ligand-binding, or protein structure. BMC Bioinformatics 2012; 13:55. [PMID: 22480135 PMCID: PMC3391178 DOI: 10.1186/1471-2105-13-55] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2011] [Accepted: 04/05/2012] [Indexed: 11/12/2022] Open
Abstract
Background One aim of the in silico characterization of proteins is to identify all residue-positions, which are crucial for function or structure. Several sequence-based algorithms exist, which predict functionally important sites. However, with respect to sequence information, many functionally and structurally important sites are hard to distinguish and consequently a large number of incorrectly predicted functional sites have to be expected. This is why we were interested to design a new classifier that differentiates between functionally and structurally important sites and to assess its performance on representative datasets. Results We have implemented CLIPS-1D, which predicts a role in catalysis, ligand-binding, or protein structure for residue-positions in a mutually exclusive manner. By analyzing a multiple sequence alignment, the algorithm scores conservation as well as abundance of residues at individual sites and their local neighborhood and categorizes by means of a multiclass support vector machine. A cross-validation confirmed that residue-positions involved in catalysis were identified with state-of-the-art quality; the mean MCC-value was 0.34. For structurally important sites, prediction quality was considerably higher (mean MCC = 0.67). For ligand-binding sites, prediction quality was lower (mean MCC = 0.12), because binding sites and structurally important residue-positions share conservation and abundance values, which makes their separation difficult. We show that classification success varies for residues in a class-specific manner. This is why our algorithm computes residue-specific p-values, which allow for the statistical assessment of each individual prediction. CLIPS-1D is available as a Web service at http://www-bioinf.uni-regensburg.de/. Conclusions CLIPS-1D is a classifier, whose prediction quality has been determined separately for catalytic sites, ligand-binding sites, and structurally important sites. It generates hypotheses about residue-positions important for a set of homologous proteins and focuses on conservation and abundance signals. Thus, the algorithm can be applied in cases where function cannot be transferred from well-characterized proteins by means of sequence comparison.
Collapse
Affiliation(s)
- Jan-Oliver Janda
- Institute of Biophysics and Physical Biochemistry, University of Regensburg, 93040 Regensburg, Germany.
| | | | | | | | | |
Collapse
|
10
|
Bay DC, Hafez M, Young MJ, Court DA. Phylogenetic and coevolutionary analysis of the β-barrel protein family comprised of mitochondrial porin (VDAC) and Tom40. BIOCHIMICA ET BIOPHYSICA ACTA-BIOMEMBRANES 2011; 1818:1502-19. [PMID: 22178864 DOI: 10.1016/j.bbamem.2011.11.027] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/07/2011] [Revised: 11/14/2011] [Accepted: 11/22/2011] [Indexed: 12/21/2022]
Abstract
Beta-barrel proteins are the main transit points across the mitochondrial outer membrane. Mitochondrial porin, the voltage-dependent, anion-selective channel (VDAC), is responsible for the passage of small molecules between the mitochondrion and the cytosol. Through interactions with other mitochondrial and cellular proteins, it is involved in regulating organellar and cellular metabolism and likely contributes to mitochondrial structure. Tom40 is part of the translocase of the outer membrane, and acts as the channel for passage of preproteins during their import into the organelle. These proteins appear to share a common evolutionary origin and structure. In the current study, the evolutionary relationships between and within both proteins were investigated through phylogenetic analysis. The two groups have a common origin and have followed independent, complex evolutionary pathways, leading to the generation of paralogues in animals and plants. Structures of diverse representatives were modeled, revealing common themes rather than sites of high identity in both groups. Within each group, intramolecular coevolution was assessed, revealing a new set of sites potentially involved in structure-function relationships in these molecules. A weak link between Tom40 and proteins related to the mitochondrial distribution and morphology protein, Mdm10, was identified. This article is part of a Special Issue entitled: VDAC structure, function, and regulation of mitochondrial metabolism.
Collapse
Affiliation(s)
- Denice C Bay
- Department of Microbiology, University of Manitoba, Winnipeg, Manitoba, Canada
| | | | | | | |
Collapse
|
11
|
Lee J, Goodey NM. Catalytic contributions from remote regions of enzyme structure. Chem Rev 2011; 111:7595-624. [PMID: 21923192 DOI: 10.1021/cr100042n] [Citation(s) in RCA: 68] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
Affiliation(s)
- Jeeyeon Lee
- Department of Chemistry, 413 Wartik Laboratory, The Pennsylvania State University, University Park, Pennsylvania 16802, USA.
| | | |
Collapse
|
12
|
Gao H, Dou Y, Yang J, Wang J. New methods to measure residues coevolution in proteins. BMC Bioinformatics 2011; 12:206. [PMID: 21612664 PMCID: PMC3123609 DOI: 10.1186/1471-2105-12-206] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2010] [Accepted: 05/26/2011] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The covariation of two sites in a protein is often used as the degree of their coevolution. To quantify the covariation many methods have been developed and most of them are based on residues position-specific frequencies by using the mutual information (MI) model. RESULTS In the paper, we proposed several new measures to incorporate new biological constraints in quantifying the covariation. The first measure is the mutual information with the amino acid background distribution (MIB), which incorporates the amino acid background distribution into the marginal distribution of the MI model. The modification is made to remove the effect of amino acid evolutionary pressure in measuring covariation. The second measure is the mutual information of residues physicochemical properties (MIP), which is used to measure the covariation of physicochemical properties of two sites. The third measure called MIBP is proposed by applying residues physicochemical properties into the MIB model. Moreover, scores of our new measures are applied to a robust indicator conn(k) in finding the covariation signal of each site. CONCLUSIONS We find that incorporating amino acid background distribution is effective in removing the effect of evolutionary pressure of amino acids. Thus the MIB measure describes more biological background information for the coevolution of residues. Besides, our analysis also reveals that the covariation of physicochemical properties is a new aspect of coevolution information.
Collapse
Affiliation(s)
- Hongyun Gao
- School of Mathematical Sciences, Dalian University of Technology, Dalian, People’s Republic of China
| | | | | | | |
Collapse
|
13
|
Hamer R, Luo Q, Armitage JP, Reinert G, Deane CM. i-Patch: interprotein contact prediction using local network information. Proteins 2011; 78:2781-97. [PMID: 20635422 DOI: 10.1002/prot.22792] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Biological processes are commonly controlled by precise protein-protein interactions. These connections rely on specific amino acids at the binding interfaces. Here we predict the binding residues of such interprotein complexes. We have developed a suite of methods, i-Patch, which predict the interprotein contact sites by considering the two proteins as a network, with residues as nodes and contacts as edges. i-Patch starts with two proteins, A and B, which are assumed to interact, but for which the structure of the complex is not available. However, we assume that for each protein, we have a reference structure and a multiple sequence alignment of homologues. i-Patch then uses the propensities of patches of residues to interact, to predict interprotein contact sites. i-Patch outperforms several other tested algorithms for prediction of interprotein contact sites. It gives 59% precision with 20% recall on a blind test set of 31 protein pairs. Combining the i-Patch scores with an existing correlated mutation algorithm, McBASC, using a logistic model gave little improvement. Results from a case study, on bacterial chemotaxis protein complexes, demonstrate that our predictions can identify contact residues, as well as suggesting unknown interfaces in multiprotein complexes.
Collapse
Affiliation(s)
- Rebecca Hamer
- Oxford Centre for Integrative Systems Biology, Department of Biochemistry, University of Oxford, Oxford, United Kingdom
| | | | | | | | | |
Collapse
|
14
|
Guerrero D, Bautista R, Villalobos DP, Cantón FR, Claros MG. AlignMiner: a Web-based tool for detection of divergent regions in multiple sequence alignments of conserved sequences. Algorithms Mol Biol 2010; 5:24. [PMID: 20525162 PMCID: PMC2902484 DOI: 10.1186/1748-7188-5-24] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2010] [Accepted: 06/02/2010] [Indexed: 01/09/2023] Open
Abstract
Background Multiple sequence alignments are used to study gene or protein function, phylogenetic relations, genome evolution hypotheses and even gene polymorphisms. Virtually without exception, all available tools focus on conserved segments or residues. Small divergent regions, however, are biologically important for specific quantitative polymerase chain reaction, genotyping, molecular markers and preparation of specific antibodies, and yet have received little attention. As a consequence, they must be selected empirically by the researcher. AlignMiner has been developed to fill this gap in bioinformatic analyses. Results AlignMiner is a Web-based application for detection of conserved and divergent regions in alignments of conserved sequences, focusing particularly on divergence. It accepts alignments (protein or nucleic acid) obtained using any of a variety of algorithms, which does not appear to have a significant impact on the final results. AlignMiner uses different scoring methods for assessing conserved/divergent regions, Entropy being the method that provides the highest number of regions with the greatest length, and Weighted being the most restrictive. Conserved/divergent regions can be generated either with respect to the consensus sequence or to one master sequence. The resulting data are presented in a graphical interface developed in AJAX, which provides remarkable user interaction capabilities. Users do not need to wait until execution is complete and can.even inspect their results on a different computer. Data can be downloaded onto a user disk, in standard formats. In silico and experimental proof-of-concept cases have shown that AlignMiner can be successfully used to designing specific polymerase chain reaction primers as well as potential epitopes for antibodies. Primer design is assisted by a module that deploys several oligonucleotide parameters for designing primers "on the fly". Conclusions AlignMiner can be used to reliably detect divergent regions via several scoring methods that provide different levels of selectivity. Its predictions have been verified by experimental means. Hence, it is expected that its usage will save researchers' time and ensure an objective selection of the best-possible divergent region when closely related sequences are analysed. AlignMiner is freely available at http://www.scbi.uma.es/alignminer.
Collapse
|
15
|
Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 2010; 39:1353-61. [DOI: 10.1007/s00726-010-0587-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2009] [Accepted: 03/27/2010] [Indexed: 10/19/2022]
|
16
|
Dou Y, Zheng X, Wang J. Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 2009; 262:317-22. [PMID: 19808039 DOI: 10.1016/j.jtbi.2009.09.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/25/2009] [Accepted: 09/25/2009] [Indexed: 11/25/2022]
Abstract
Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Mathematical Science, Dalian University of Technology, Dalian 116024, PR China
| | | | | |
Collapse
|