1
|
Slama P. Two-domain analysis of JmjN-JmjC and PHD-JmjC lysine demethylases: Detecting an inter-domain evolutionary stress. Proteins 2017; 86:3-12. [PMID: 28975662 DOI: 10.1002/prot.25394] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Revised: 09/26/2017] [Accepted: 10/03/2017] [Indexed: 11/09/2022]
Abstract
Residues at different positions of a multiple sequence alignment sometimes evolve together, due to a correlated structural or functional stress at these positions. Co-evolution has thus been evidenced computationally in multiple proteins or protein domains. Here, we wish to study whether an evolutionary stress is exerted on a sequence alignment across protein domains, i.e., on longer sequence separations than within a single protein domain. JmjC-containing lysine demethylases were chosen for analysis, as a follow-up to previous studies; these proteins are important multidomain epigenetic regulators. In these proteins, the JmjC domain is responsible for the demethylase activity, and surrounding domains interact with histones, DNA or partner proteins. This family of enzymes was analyzed at the sequence level, in order to determine whether the sequence of JmjC-domains was affected by the presence of a neighboring JmjN domain or PHD finger in the protein. Multiple positions within JmjC sequences were shown to have their residue distributions significantly altered by the presence of the second domain. Structural considerations confirmed the relevance of the analysis for JmjN-JmjC proteins, while among PHD-JmjC proteins, the length of the linker region could be correlated to the residues observed at the most affected positions. The correlation of domain architecture with residue types at certain positions, as well as that of overall architecture with protein function, is discussed. The present results thus evidence the existence of an across-domain evolutionary stress in JmjC-containing demethylases, and provide further insights into the overall domain architecture of JmjC domain-containing proteins.
Collapse
Affiliation(s)
- Patrick Slama
- Independent researcher, Paris, France; Center for Imaging Science, the Johns Hopkins University, Clark Hall, 3400 N Charles Street, Baltimore, Maryland, 21218
| |
Collapse
|
2
|
EXIA2: web server of accurate and rapid protein catalytic residue prediction. BIOMED RESEARCH INTERNATIONAL 2014; 2014:807839. [PMID: 25295274 PMCID: PMC4177735 DOI: 10.1155/2014/807839] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/21/2014] [Revised: 05/27/2014] [Accepted: 06/11/2014] [Indexed: 11/18/2022]
Abstract
We propose a method (EXIA2) of catalytic residue prediction based on protein structure without needing homology information. The method is based on the special side chain orientation of catalytic residues. We found that the side chain of catalytic residues usually points to the center of the catalytic site. The special orientation is usually observed in catalytic residues but not in noncatalytic residues, which usually have random side chain orientation. The method is shown to be the most accurate catalytic residue prediction method currently when combined with PSI-Blast sequence conservation. It performs better than other competing methods on several benchmark datasets that include over 1,200 enzyme structures. The areas under the ROC curve (AUC) on these benchmark datasets are in the range from 0.934 to 0.968.
Collapse
|
3
|
On the structural context and identification of enzyme catalytic residues. BIOMED RESEARCH INTERNATIONAL 2013; 2013:802945. [PMID: 23484160 PMCID: PMC3581254 DOI: 10.1155/2013/802945] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/29/2012] [Accepted: 12/28/2012] [Indexed: 11/25/2022]
Abstract
Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.
Collapse
|
4
|
Accurate prediction of protein catalytic residues by side chain orientation and residue contact density. PLoS One 2012; 7:e47951. [PMID: 23110141 PMCID: PMC3480458 DOI: 10.1371/journal.pone.0047951] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2012] [Accepted: 09/18/2012] [Indexed: 11/19/2022] Open
Abstract
Prediction of protein catalytic residues provides useful information for the studies of protein functions. Most of the existing methods combine both structure and sequence information but heavily rely on sequence conservation from multiple sequence alignments. The contribution of structure information is usually less than that of sequence conservation in existing methods. We found a novel structure feature, residue side chain orientation, which is the first structure-based feature that achieves prediction results comparable to that of evolutionary sequence conservation. We developed a structure-based method, Enzyme Catalytic residue SIde-chain Arrangement (EXIA), which is based on residue side chain orientations and backbone flexibility of protein structure. The prediction that uses EXIA outperforms existing structure-based features. The prediction quality of combing EXIA and sequence conservation exceeds that of the state-of-the-art prediction methods. EXIA is designed to predict catalytic residues from single protein structure without needing sequence or structure alignments. It provides invaluable information when there is no sufficient or reliable homology information for target protein. We found that catalytic residues have very special side chain orientation and designed the EXIA method based on the newly discovered feature. It was also found that EXIA performs well for a dataset of enzymes without any bounded ligand in their crystallographic structures.
Collapse
|
5
|
Dou Y, Wang J, Yang J, Zhang C. L1pred: a sequence-based prediction tool for catalytic residues in enzymes with the L1-logreg classifier. PLoS One 2012; 7:e35666. [PMID: 22558194 PMCID: PMC3338704 DOI: 10.1371/journal.pone.0035666] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/04/2012] [Accepted: 03/19/2012] [Indexed: 12/01/2022] Open
Abstract
To understand enzyme functions, identifying the catalytic residues is a usual first step. Moreover, knowledge about catalytic residues is also useful for protein engineering and drug-design. However, to experimentally identify catalytic residues remains challenging for reasons of time and cost. Therefore, computational methods have been explored to predict catalytic residues. Here, we developed a new algorithm, L1pred, for catalytic residue prediction, by using the L1-logreg classifier to integrate eight sequence-based scoring functions. We tested L1pred and compared it against several existing sequence-based methods on carefully designed datasets Data604 and Data63. With ten-fold cross-validation, L1pred showed the area under precision-recall curve (AUPR) and the area under ROC curve (AUC) of 0.2198 and 0.9494 on the training dataset, Data604, respectively. In addition, on the independent test dataset, Data63, it showed the AUPR and AUC values of 0.2636 and 0.9375, respectively. Compared with other sequence-based methods, L1pred showed the best performance on both datasets. We also analyzed the importance of each attribute in the algorithm, and found that all the scores contributed more or less equally to the L1pred performance.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, Nebraska, United States of America
| | - Jun Wang
- Scientific Computing Key Laboratory of Shanghai Universities, Shanghai, People’s Republic of China
- Department of Mathematics, Shanghai Normal University, Shanghai, People’s Republic of China
| | - Jialiang Yang
- MPI-Institute of Computational Biology, Chinese Academy of Sciences, Shanghai, People’s Republic of China
| | - Chi Zhang
- School of Biological Sciences, Center for Plant Science and Innovation, University of Nebraska, Lincoln, Nebraska, United States of America
- * E-mail:
| |
Collapse
|
6
|
Choi K, Kim S. Sequence-based enzyme catalytic domain prediction using clustering and aggregated mutual information content. J Bioinform Comput Biol 2012; 9:597-611. [PMID: 21976378 DOI: 10.1142/s0219720011005677] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2011] [Accepted: 08/08/2011] [Indexed: 11/18/2022]
Abstract
Characterizing enzyme sequences and identifying their active sites is a very important task. The current experimental methods are too expensive and labor intensive to handle the rapidly accumulating protein sequences and structure data. Thus accurate, high-throughput in silico methods for identifying catalytic residues and enzyme function prediction are much needed. In this paper, we propose a novel sequence-based catalytic domain prediction method using a sequence clustering and an information-theoretic approaches. The first step is to perform the sequence clustering analysis of enzyme sequences from the same functional category (those with the same EC label). The clustering analysis is used to handle the problem of widely varying sequence similarity levels in enzyme sequences. The clustering analysis constructs a sequence graph where nodes are enzyme sequences and edges are a pair of sequences with a certain degree of sequence similarity, and uses graph properties, such as biconnected components and articulation points, to generate sequence segments common to the enzyme sequences. Then amino acid subsequences in the common shared regions are aligned and then an information theoretic approach called aggregated column related scoring scheme is performed to highlight potential active sites in enzyme sequences. The aggregated information content scoring scheme is shown to be effective to highlight residues of active sites effectively. The proposed method of combining the clustering and the aggregated information content scoring methods was successful in highlighting known catalytic sites in enzymes of Escherichia coli K12 in terms of the Catalytic Site Atlas database. Our method is shown to be not only accurate in predicting potential active sites in the enzyme sequences but also computationally efficient since the clustering approach utilizes two graph properties that can be computed in linear to the number of edges in the sequence graph and computation of mutual information does not require much time. We believe that the proposed method can be useful for identifying active sites of enzyme sequences from many genome projects.
Collapse
Affiliation(s)
- Kwangmin Choi
- Division of Experimental Hematology and Cancer Biology, Cincinnati Children's Hospital Medical Center, 3333 Burnet Avenue, Cincinnati, Ohio 45229, USA.
| | | |
Collapse
|
7
|
Abstract
This chapter describes a method for analyzing the allosteric influence of molecular interactions on protein conformational distributions. The method, called Dynamics Perturbation Analysis (DPA), generally yields insights into allosteric effects in proteins and is especially useful for predicting ligand-binding sites. The use of DPA for binding site prediction is motivated by the following allosteric regulation hypothesis: interactions in native binding sites cause a large change in protein conformational distributions. Here, we review the reasoning behind this hypothesis, describe the math behind the method, and present a recipe for predicting binding sites using DPA.
Collapse
Affiliation(s)
- Dengming Ming
- Department of Physiology and Biophysics, School of Life Science, Fudan University, Shanghai, China
| | | |
Collapse
|
8
|
Othersen OG, Stefani AG, Huber JB, Sticht H. Application of information theory to feature selection in protein docking. J Mol Model 2011; 18:1285-97. [PMID: 21748327 DOI: 10.1007/s00894-011-1157-6] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2011] [Accepted: 06/21/2011] [Indexed: 12/26/2022]
Abstract
In the era of structural genomics, the prediction of protein interactions using docking algorithms is an important goal. The success of this method critically relies on the identification of good docking solutions among a vast excess of false solutions. We have adapted the concept of mutual information (MI) from information theory to achieve a fast and quantitative screening of different structural features with respect to their ability to discriminate between physiological and nonphysiological protein interfaces. The strategy includes the discretization of each structural feature into distinct value ranges to optimize its mutual information. We have selected 11 structural features and two datasets to demonstrate that the MI is dimensionless and can be directly compared for diverse structural features and between datasets of different sizes. Conversion of the MI values into a simple scoring function revealed that those features with a higher MI are actually more powerful for the identification of good docking solutions. Thus, an MI-based approach allows the rapid screening of structural features with respect to their information content and should therefore be helpful for the design of improved scoring functions in future. In addition, the concept presented here may also be adapted to related areas that require feature selection for biomolecules or organic ligands.
Collapse
Affiliation(s)
- Olaf G Othersen
- Bioinformatik, Institut für Biochemie, Friedrich-Alexander-Universität Erlangen-Nürnberg, Fahrstr. 17, 91054 Erlangen, Germany
| | | | | | | |
Collapse
|
9
|
Dou Y, Geng X, Gao H, Yang J, Zheng X, Wang J. Sequence Conservation in the Prediction of Catalytic Sites. Protein J 2011; 30:229-39. [PMID: 21465136 DOI: 10.1007/s10930-011-9324-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
10
|
Novel feature for catalytic protein residues reflecting interactions with other residues. PLoS One 2011; 6:e16932. [PMID: 21468322 PMCID: PMC3066176 DOI: 10.1371/journal.pone.0016932] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2010] [Accepted: 01/10/2011] [Indexed: 11/29/2022] Open
Abstract
Owing to their potential for systematic analysis, complex networks have been
widely used in proteomics. Representing a protein structure as a topology
network provides novel insight into understanding protein folding mechanisms,
stability and function. Here, we develop a new feature to reveal
correlations between residues using a protein structure network. In an original
attempt to quantify the effects of several key residues on catalytic residues, a
power function was used to model interactions between residues. The results
indicate that focusing on a few residues is a feasible approach to identifying
catalytic residues. The spatial environment surrounding a catalytic residue was
analyzed in a layered manner. We present evidence that correlation between
residues is related to their distance apart most environmental parameters of the
outer layer make a smaller contribution to prediction and ii catalytic residues
tend to be located near key positions in enzyme folds. Feature analysis revealed
satisfactory performance for our features, which were combined with several
conventional features in a prediction model for catalytic residues using a
comprehensive data set from the Catalytic Site Atlas. Values of 88.6 for
sensitivity and 88.4 for specificity were obtained by 10fold crossvalidation.
These results suggest that these features reveal the mutual dependence of
residues and are promising for further study of structurefunction
relationship.
Collapse
|
11
|
Vuong TV, Wilson DB. Glycoside hydrolases: catalytic base/nucleophile diversity. Biotechnol Bioeng 2010; 107:195-205. [PMID: 20552664 DOI: 10.1002/bit.22838] [Citation(s) in RCA: 69] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Recent studies have shown that a number of glycoside hydrolase families do not follow the classical catalytic mechanisms, as they lack a typical catalytic base/nucleophile. A variety of mechanisms are used to replace this function, including substrate-assisted catalysis, a network of several residues, and the use of non-carboxylate residues or exogenous nucleophiles. Removal of the catalytic base/nucleophile by mutation can have a profound impact on substrate specificity, producing enzymes with completely new functions.
Collapse
Affiliation(s)
- Thu V Vuong
- Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY 14850, USA
| | | |
Collapse
|
12
|
Slama P, Geman D. Identification of family-determining residues in PHD fingers. Nucleic Acids Res 2010; 39:1666-79. [PMID: 21059680 PMCID: PMC3061080 DOI: 10.1093/nar/gkq947] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Histone modifications are fundamental to chromatin structure and transcriptional regulation, and are recognized by a limited number of protein folds. Among these folds are PHD fingers, which are present in most chromatin modification complexes. To date, about 15 PHD finger domains have been structurally characterized, whereas hundreds of different sequences have been identified. Consequently, an important open problem is to predict structural features of a PHD finger knowing only its sequence. Here, we classify PHD fingers into different groups based on the analysis of residue–residue co-evolution in their sequences. We measure the degree to which fixing the amino acid type at one position modifies the frequencies of amino acids at other positions. We then detect those position/amino acid combinations, or ‘conditions’, which have the strongest impact on other sequence positions. Clustering these strong conditions yields four families, providing informative labels for PHD finger sequences. Existing experimental results, as well as docking calculations performed here, reveal that these families indeed show discrepancies at the functional level. Our method should facilitate the functional characterization of new PHD fingers, as well as other protein families, solely based on sequence information.
Collapse
Affiliation(s)
- Patrick Slama
- Institute for Computational Medicine and Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
| | | |
Collapse
|
13
|
Networks of high mutual information define the structural proximity of catalytic sites: implications for catalytic residue identification. PLoS Comput Biol 2010; 6:e1000978. [PMID: 21079665 PMCID: PMC2973806 DOI: 10.1371/journal.pcbi.1000978] [Citation(s) in RCA: 61] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2010] [Accepted: 09/27/2010] [Indexed: 11/19/2022] Open
Abstract
Identification of catalytic residues (CR) is essential for the characterization of enzyme function. CR are, in general, conserved and located in the functional site of a protein in order to attain their function. However, many non-catalytic residues are highly conserved and not all CR are conserved throughout a given protein family making identification of CR a challenging task. Here, we put forward the hypothesis that CR carry a particular signature defined by networks of close proximity residues with high mutual information (MI), and that this signature can be applied to distinguish functional from other non-functional conserved residues. Using a data set of 434 Pfam families included in the catalytic site atlas (CSA) database, we tested this hypothesis and demonstrated that MI can complement amino acid conservation scores to detect CR. The Kullback-Leibler (KL) conservation measurement was shown to significantly outperform both the Shannon entropy and maximal frequency measurements. Residues in the proximity of catalytic sites were shown to be rich in shared MI. A structural proximity MI average score (termed pMI) was demonstrated to be a strong predictor for CR, thus confirming the proposed hypothesis. A structural proximity conservation average score (termed pC) was also calculated and demonstrated to carry distinct information from pMI. A catalytic likeliness score (Cls), combining the KL, pC and pMI measures, was shown to lead to significantly improved prediction accuracy. At a specificity of 0.90, the Cls method was found to have a sensitivity of 0.816. In summary, we demonstrate that networks of residues with high MI provide a distinct signature on CR and propose that such a signature should be present in other classes of functional residues where the requirement to maintain a particular function places limitations on the diversification of the structural environment along the course of evolution.
Collapse
|
14
|
Horst JA, Samudrala R. A protein sequence meta-functional signature for calcium binding residue prediction. Pattern Recognit Lett 2010; 31:2103-2112. [PMID: 20824111 PMCID: PMC2932634 DOI: 10.1016/j.patrec.2010.04.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/23/2022]
Abstract
The diversity of characterized protein functions found amongst experimentally interrogated proteins suggests that a vast array of unknown functions remains undiscovered. These protein functions are imparted by specific geometric distributions of amino acid residue chemical moieties, each contributing a functional interaction. We hypothesize that individual residue function contributions are predictable through sequence analytic knowledge based algorithms, and that they can be recombined to understand composite protein function by predicting spatial relation in tertiary structure. We assess the former by training a meta-functional signature algorithm to specifically predict calcium ion binding residues from protein sequence. We estimate the latter by testing for match between predictive contribution of positions in predicted secondary structures and patterns of side chain proximity forced by secondary structure moieties. Specific training for calcium binding results in 83% area under the receiver operator characteristic curve added value over random (AUCoR) and p<10(-300) significance as measured by Kendall's τ in ten fold cross validation for parallel sets of 811 residues in 336 proteins and 696 residues in 299 proteins. Training for generalized function results in 63% AUCoR and p≅10(-221) for the same tests. Including inference of side chain proximity improves predictive ability by 2% AUCoR consistently. The results demonstrate that protein meta-functional signatures can be trained to predict specific protein functions by considering amino acid identity and structural features accessible from sequence, laying the groundwork for composite sequence based function site prediction.
Collapse
Affiliation(s)
- Jeremy A Horst
- Department of Oral Biology, School of Dentistry, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
- Department of Microbiology, School of Medicine, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
| | - Ram Samudrala
- Department of Oral Biology, School of Dentistry, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
- Department of Microbiology, School of Medicine, University of Washington, 1959 NE Pacific St #357132, Seattle, WA 98195
| |
Collapse
|
15
|
Prediction of catalytic residues based on an overlapping amino acid classification. Amino Acids 2010; 39:1353-61. [DOI: 10.1007/s00726-010-0587-2] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2009] [Accepted: 03/27/2010] [Indexed: 10/19/2022]
|
16
|
López G, Ezkurdia I, Tress ML. Assessment of ligand binding residue predictions in CASP8. Proteins 2010; 77 Suppl 9:138-46. [PMID: 19714771 DOI: 10.1002/prot.22557] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Here we detail the assessment process for the binding site prediction category of the eighth Critical Assessment of Protein Structure Prediction experiment (CASP8). Predictions were only evaluated for those targets that bound biologically relevant ligands and were assessed using the Matthews Correlation Coefficient. The results of the analysis clearly demonstrate that three predictors from two groups (Lee and Sternberg) stand out from the rest. A further two groups perform well over subsets of metal binding or nonmetal ligand binding targets. The best methods were able to make consistently reliable predictions based on model structures, though it was noticeable that the two targets that were not well predicted were also the hardest targets. The number of predictors that submitted new methods in this category was highly encouraging and suggests that current technology is at the level that experimental biochemists and structural biologists could benefit from what is clearly a growing field.
Collapse
Affiliation(s)
- Gonzalo López
- Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain
| | | | | |
Collapse
|
17
|
Dou Y, Zheng X, Wang J. Several appropriate background distributions for entropy-based protein sequence conservation measures. J Theor Biol 2009; 262:317-22. [PMID: 19808039 DOI: 10.1016/j.jtbi.2009.09.030] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2009] [Revised: 09/25/2009] [Accepted: 09/25/2009] [Indexed: 11/25/2022]
Abstract
Amino acid background distribution is an important factor for entropy-based methods which extract sequence conservation information from protein multiple sequence alignments (MSAs). However, MSAs are usually not large enough to allow a reliable observed background distribution. In this paper, we propose two new estimations of background distribution. One is an integration of the observed background distribution and the position-specific residue distribution, and the other is a normalized square root of observed background frequency. To validate these new background distributions, they are applied to the relative entropy model to find catalytic sites and ligand binding sites from protein MSAs. Experimental results show that they are superior to the observed background distribution in predicting functionally important residues.
Collapse
Affiliation(s)
- Yongchao Dou
- School of Mathematical Science, Dalian University of Technology, Dalian 116024, PR China
| | | | | |
Collapse
|
18
|
Zhang T, Zhang H, Chen K, Shen S, Ruan J, Kurgan L. Accurate sequence-based prediction of catalytic residues. ACTA ACUST UNITED AC 2008; 24:2329-38. [PMID: 18710875 DOI: 10.1093/bioinformatics/btn433] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Prediction of catalytic residues provides useful information for the research on function of enzymes. Most of the existing prediction methods are based on structural information, which limits their use. We propose a sequence-based catalytic residue predictor that provides predictions with quality comparable to modern structure-based methods and that exceeds quality of state-of-the-art sequence-based methods. RESULTS Our method (CRpred) uses sequence-based features and the sequence-derived PSI-BLAST profile. We used feature selection to reduce the dimensionality of the input (and explain the input) to support vector machine (SVM) classifier that provides predictions. Tests on eight datasets and side-by-side comparison with six modern structure- and sequence-based predictors show that CRpred provides predictions with quality comparable to current structure-based methods and better than sequence-based methods. The proposed method obtains 15-19% precision and 48-58% TP (true positive) rate, depending on the dataset used. CRpred also provides confidence values that allow selecting a subset of predictions with higher precision. The improved quality is due to newly designed features and careful parameterization of the SVM. The features incorporate amino acids characterized by the highest and the lowest propensities to constitute catalytic residues, Gly that provides flexibility for catalytic sites and sequence motifs characteristic to certain catalytic reactions. Our features indicate that catalytic residues are on average more conserved when compared with the general population of residues and that highly conserved amino acids characterized by high catalytic propensity are likely to form catalytic sites. We also show that local (with respect to the sequence) hydrophobicity contributes towards the prediction.
Collapse
Affiliation(s)
- Tuo Zhang
- College of Mathematical Science and LPMC, Nankai University, Tianjin, PRC
| | | | | | | | | | | |
Collapse
|
19
|
Seo JH, Park HY, Kim J, Lee BS, Kim BG. Exploring sequence space: Profile analysis and protein-ligand docking to screen ω-aminotransferases with expanded substrate specificity. Biotechnol J 2008; 3:676-86. [DOI: 10.1002/biot.200700264] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
20
|
Ming D, Cohn JD, Wall ME. Fast dynamics perturbation analysis for prediction of protein functional sites. BMC STRUCTURAL BIOLOGY 2008; 8:5. [PMID: 18234095 PMCID: PMC2276503 DOI: 10.1186/1472-6807-8-5] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2007] [Accepted: 01/30/2008] [Indexed: 11/10/2022]
Abstract
Background We present a fast version of the dynamics perturbation analysis (DPA) algorithm to predict functional sites in protein structures. The original DPA algorithm finds regions in proteins where interactions cause a large change in the protein conformational distribution, as measured using the relative entropy Dx. Such regions are associated with functional sites. Results The Fast DPA algorithm, which accelerates DPA calculations, is motivated by an empirical observation that Dx in a normal-modes model is highly correlated with an entropic term that only depends on the eigenvalues of the normal modes. The eigenvalues are accurately estimated using first-order perturbation theory, resulting in a N-fold reduction in the overall computational requirements of the algorithm, where N is the number of residues in the protein. The performance of the original and Fast DPA algorithms was compared using protein structures from a standard small-molecule docking test set. For nominal implementations of each algorithm, top-ranked Fast DPA predictions overlapped the true binding site 94% of the time, compared to 87% of the time for original DPA. In addition, per-protein recall statistics (fraction of binding-site residues that are among predicted residues) were slightly better for Fast DPA. On the other hand, per-protein precision statistics (fraction of predicted residues that are among binding-site residues) were slightly better using original DPA. Overall, the performance of Fast DPA in predicting ligand-binding-site residues was comparable to that of the original DPA algorithm. Conclusion Compared to the original DPA algorithm, the decreased run time with comparable performance makes Fast DPA well-suited for implementation on a web server and for high-throughput analysis.
Collapse
Affiliation(s)
- Dengming Ming
- Computer, Computational, and Statistical Scienes Division, Los Alamos National Laboratory, Los Alamos, New Mexico, USA.
| | | | | |
Collapse
|