1
|
Trerotola M, Antolini L, Beni L, Guerra E, Spadaccini M, Verzulli D, Moschella A, Alberti S. A deterministic code for transcription factor-DNA recognition through computation of binding interfaces. NAR Genom Bioinform 2022; 4:lqac008. [PMID: 35261972 PMCID: PMC8896162 DOI: 10.1093/nargab/lqac008] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2021] [Revised: 12/05/2021] [Accepted: 02/28/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
The recognition code between transcription factor (TF) amino acids and DNA bases remains poorly understood. Here, the determinants of TF amino acid-DNA base binding selectivity were identified through the analysis of crystals of TF-DNA complexes. Selective, high-frequency interactions were identified for the vast majority of amino acid side chains (‘structural code’). DNA binding specificities were then independently assessed by meta-analysis of random-mutagenesis studies of Zn finger-target DNA sequences. Selective, high-frequency interactions were identified for the majority of mutagenized residues (‘mutagenesis code’). The structural code and the mutagenesis code were shown to match to a striking level of accuracy (P = 3.1 × 10−33), suggesting the identification of fundamental rules of TF binding to DNA bases. Additional insight was gained by showing a geometry-dictated choice among DNA-binding TF residues with overlapping specificity. These findings indicate the existence of a DNA recognition mode whereby the physical-chemical characteristics of the interacting residues play a deterministic role. The discovery of this DNA recognition code advances our knowledge on fundamental features of regulation of gene expression and is expected to pave the way for integration with higher-order complexity approaches.
Collapse
Affiliation(s)
- Marco Trerotola
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
- Department of Medical, Oral and Biotechnological Sciences, University “G. d’Annunzio”, 66100 Chieti, Italy
| | - Laura Antolini
- Center for Biostatistics, Department of Clinical Medicine, Prevention and Biotechnology, University of Milano-Bicocca, 20052 Monza, Italy
| | - Laura Beni
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
| | - Emanuela Guerra
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
- Department of Medical, Oral and Biotechnological Sciences, University “G. d’Annunzio”, 66100 Chieti, Italy
| | | | - Damiano Verzulli
- Unit of Informatics, University “G. d’Annunzio”, 66100 Chieti, Italy
| | - Antonino Moschella
- Unit of Medical Genetics, Department of Biomedical Sciences - BIOMORF, University of Messina, via Consolare Valeria, 98125 Messina, Italy
| | - Saverio Alberti
- Laboratory of Cancer Pathology, Center for Advanced Studies and Technology (CAST), University “G. D’ Annunzio”, Via L. Polacchi 11, 66100 Chieti, Italy
- Unit of Medical Genetics, Department of Biomedical Sciences - BIOMORF, University of Messina, via Consolare Valeria, 98125 Messina, Italy
| |
Collapse
|
2
|
Laforet M, McMurrough TA, Vu M, Brown CM, Zhang K, Junop MS, Gloor GB, Edgell DR. Modifying a covarying protein-DNA interaction changes substrate preference of a site-specific endonuclease. Nucleic Acids Res 2019; 47:10830-10841. [PMID: 31602462 PMCID: PMC6847045 DOI: 10.1093/nar/gkz866] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2019] [Revised: 09/17/2019] [Accepted: 10/09/2019] [Indexed: 12/23/2022] Open
Abstract
Identifying and validating intermolecular covariation between proteins and their DNA-binding sites can provide insights into mechanisms that regulate selectivity and starting points for engineering new specificity. LAGLIDADG homing endonucleases (meganucleases) can be engineered to bind non-native target sites for gene-editing applications, but not all redesigns successfully reprogram specificity. To gain a global overview of residues that influence meganuclease specificity, we used information theory to identify protein-DNA covariation. Directed evolution experiments of one predicted pair, 227/+3, revealed variants with surprising shifts in I-OnuI substrate preference at the central 4 bases where cleavage occurs. Structural studies showed significant remodeling distant from the covarying position, including restructuring of an inter-hairpin loop, DNA distortions near the scissile phosphates, and new base-specific contacts. Our findings are consistent with a model whereby the functional impacts of covariation can be indirectly propagated to neighboring residues outside of direct contact range, allowing meganucleases to adapt to target site variation and indirectly expand the sequence space accessible for cleavage. We suggest that some engineered meganucleases may have unexpected cleavage profiles that were not rationally incorporated during the design process.
Collapse
Affiliation(s)
- Marc Laforet
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Thomas A McMurrough
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Michael Vu
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Christopher M Brown
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Kun Zhang
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Murray S Junop
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - Gregory B Gloor
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| | - David R Edgell
- Department of Biochemistry, Schulich School of Medicine and Dentistry, Western University, London, ON N6A 5C1, Canada
| |
Collapse
|
3
|
Molecular mechanisms of the protein-protein interaction-regulated binding specificity of basic-region leucine zipper transcription factors. J Mol Model 2019; 25:246. [PMID: 31342181 DOI: 10.1007/s00894-019-4138-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2019] [Accepted: 07/14/2019] [Indexed: 10/26/2022]
Abstract
It is well known that the DNA-binding specificity of transcription factors (TFs) is influenced by protein-protein interactions (PPIs). However, the underlying molecular mechanisms remain largely unknown. In this work, we adopted the cAMP-response element-binding protein (CREB) of the basic leucine zipper (bZIP) TF family as a model system, and a workflow of combined bioinformatics and molecular modeling analysis of protein-DNA interaction was tested. First, the multiple sequence alignment and SDPsite method were used to find potential bZIP family binding specificity determining positions (SDPs) within the protein-protein interaction region. Second, the mutation system was analyzed using molecular dynamics simulation. Molecular mechanics Poisson-Boltzmann surface area (MM/PBSA) free energy calculations confirmed the enhancement of the binding affinity of the mutation, which was in agreement with experimental results. The root mean square fluctuation (RMSF) and hydrogen bonding changes suggested an open and close protein dimerization process after the system was mutated, which resulted in the change of the hydrogen bonding of the protein-DNA interface and a slight conformational change. We believe that this work will contribute to understanding the protein-protein interaction-regulated binding specificity of bZIP transcription factors.
Collapse
|
4
|
Korostelev YD, Zharov IA, Mironov AA, Rakhmaininova AB, Gelfand MS. Identification of Position-Specific Correlations between DNA-Binding Domains and Their Binding Sites. Application to the MerR Family of Transcription Factors. PLoS One 2016; 11:e0162681. [PMID: 27690309 PMCID: PMC5045206 DOI: 10.1371/journal.pone.0162681] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2015] [Accepted: 08/26/2016] [Indexed: 11/25/2022] Open
Abstract
The large and increasing volume of genomic data analyzed by comparative methods provides information about transcription factors and their binding sites that, in turn, enables statistical analysis of correlations between factors and sites, uncovering mechanisms and evolution of specific protein-DNA recognition. Here we present an online tool, Prot-DNA-Korr, designed to identify and analyze crucial protein-DNA pairs of positions in a family of transcription factors. Correlations are identified by analysis of mutual information between columns of protein and DNA alignments. The algorithm reduces the effects of common phylogenetic history and of abundance of closely related proteins and binding sites. We apply it to five closely related subfamilies of the MerR family of bacterial transcription factors that regulate heavy metal resistance systems. We validate the approach using known 3D structures of MerR-family proteins in complexes with their cognate DNA binding sites and demonstrate that a significant fraction of correlated positions indeed form specific side-chain-to-base contacts. The joint distribution of amino acids and nucleotides hence may be used to predict changes of specificity for point mutations in transcription factors.
Collapse
Affiliation(s)
- Yuriy D. Korostelev
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Ilya A. Zharov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Andrey A. Mironov
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
| | - Alexandra B. Rakhmaininova
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
| | - Mikhail S. Gelfand
- A.A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, 19-1 Bolshoy Karetny pereulok, Moscow, Russia, 127994
- Department of Bioengineering and Bioinformatics, Moscow State University, 1-73 Vorobievy Gory, Moscow, Russia, 119991
- * E-mail:
| |
Collapse
|
5
|
Jindrich K, Degnan BM. The diversification of the basic leucine zipper family in eukaryotes correlates with the evolution of multicellularity. BMC Evol Biol 2016; 16:28. [PMID: 26831906 PMCID: PMC4736632 DOI: 10.1186/s12862-016-0598-z] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 01/19/2016] [Indexed: 01/01/2023] Open
Abstract
Background Multicellularity evolved multiple times in eukaryotes. In all cases, this required an elaboration of the regulatory mechanisms controlling gene expression. Amongst the conserved eukaryotic transcription factor families, the basic leucine zipper (bZIP) superfamily is one of the most ancient and best characterised. This gene family plays a diversity of roles in the specification, differentiation and maintenance of cell types in plants and animals. bZIPs are also involved in stress responses and the regulation of cell proliferation in fungi, amoebozoans and heterokonts. Results Using 49 sequenced genomes from across the Eukaryota, we demonstrate that the bZIP superfamily has evolved from a single ancestral eukaryotic gene and undergone multiple independent expansions. bZIP family diversification is largely restricted to multicellular lineages, consistent with bZIPs contributing to the complex regulatory networks underlying differential and cell type-specific gene expression in these lineages. Analyses focused on the Metazoa suggest an elaborate bZIP network was in place in the most recent shared ancestor of all extant animals that was comprised of 11 of the 12 previously recognized families present in modern taxa. In addition this analysis identifies three bZIP families that appear to have been lost in mammals. Thus the ancestral metazoan and eumetazoan bZIP repertoire consists of 12 and 16 bZIPs, respectively. These diversified from 7 founder genes present in the holozoan ancestor. Conclusions Our results reveal the ancestral opisthokont, holozoan and metazoan bZIP repertoire and provide insights into the progressive expansion and divergence of bZIPs in the five main eukaryotic kingdoms, suggesting that the early diversification of bZIPs in multiple eukaryotic lineages was a prerequisite for the evolution of complex multicellular organisms. Electronic supplementary material The online version of this article (doi:10.1186/s12862-016-0598-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Katia Jindrich
- School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia.
| | - Bernard M Degnan
- School of Biological Sciences, The University of Queensland, Brisbane, QLD 4072, Australia.
| |
Collapse
|
6
|
Wong KC, Li Y, Peng C, Moses AM, Zhang Z. Computational learning on specificity-determining residue-nucleotide interactions. Nucleic Acids Res 2015; 43:10180-9. [PMID: 26527718 PMCID: PMC4666365 DOI: 10.1093/nar/gkv1134] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2015] [Accepted: 10/18/2015] [Indexed: 01/02/2023] Open
Abstract
The protein–DNA interactions between transcription factors and transcription factor binding sites are essential activities in gene regulation. To decipher the binding codes, it is a long-standing challenge to understand the binding mechanism across different transcription factor DNA binding families. Past computational learning studies usually focus on learning and predicting the DNA binding residues on protein side. Taking into account both sides (protein and DNA), we propose and describe a computational study for learning the specificity-determining residue-nucleotide interactions of different known DNA-binding domain families. The proposed learning models are compared to state-of-the-art models comprehensively, demonstrating its competitive learning performance. In addition, we describe and propose two applications which demonstrate how the learnt models can provide meaningful insights into protein–DNA interactions across different DNA binding families.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science, City University of Hong Kong, Kowloon Tong, Hong Kong
| | - Yue Li
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada CSAIL, Massachusetts Institute of Technology, Cambridge, MA 02139-4307, USA
| | - Chengbin Peng
- CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, Saudi Arabia
| | - Alan M Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Zhaolei Zhang
- Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
7
|
Chakraborty A, Chakrabarti S. A survey on prediction of specificity-determining sites in proteins. Brief Bioinform 2014; 16:71-88. [DOI: 10.1093/bib/bbt092] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
8
|
Suplatov D, Shalaeva D, Kirilin E, Arzhanik V, Švedas V. Bioinformatic analysis of protein families for identification of variable amino acid residues responsible for functional diversity. J Biomol Struct Dyn 2013; 32:75-87. [DOI: 10.1080/07391102.2012.750249] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/27/2022]
|
9
|
Chakraborty A, Mandloi S, Lanczycki CJ, Panchenko AR, Chakrabarti S. SPEER-SERVER: a web server for prediction of protein specificity determining sites. Nucleic Acids Res 2012; 40:W242-8. [PMID: 22689646 PMCID: PMC3394334 DOI: 10.1093/nar/gks559] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022] Open
Abstract
Sites that show specific conservation patterns within subsets of proteins in a protein family are likely to be involved in the development of functional specificity. These sites, generally termed specificity determining sites (SDS), might play a crucial role in binding to a specific substrate or proteins. Identification of SDS through experimental techniques is a slow, difficult and tedious job. Hence, it is very important to develop efficient computational methods that can more expediently identify SDS. Herein, we present Specificity prediction using amino acids’ Properties, Entropy and Evolution Rate (SPEER)-SERVER, a web server that predicts SDS by analyzing quantitative measures of the conservation patterns of protein sites based on their physico-chemical properties and the heterogeneity of evolutionary changes between and within the protein subfamilies. This web server provides an improved representation of results, adds useful input and output options and integrates a wide range of analysis and data visualization tools when compared with the original standalone version of the SPEER algorithm. Extensive benchmarking finds that SPEER-SERVER exhibits sensitivity and precision performance that, on average, meets or exceeds that of other currently available methods. SPEER-SERVER is available at http://www.hpppi.iicb.res.in/ss/.
Collapse
Affiliation(s)
- Abhijit Chakraborty
- Structural Biology and Bioinformatics Division, Council for Scientific and Industrial Research (CSIR)-Indian Institute of Chemical Biology (IICB), Kolkata, West Bengal 700032, India
| | | | | | | | | |
Collapse
|
10
|
Kirchhof P, Marijon E, Fabritz L, Li N, Wang W, Wang T, Schulte K, Hanstein J, Schulte JS, Vogel M, Mougenot N, Laakmann S, Fortmueller L, Eckstein J, Verheule S, Kaese S, Staab A, Grote-Wessels S, Schotten U, Moubarak G, Wehrens XHT, Schmitz W, Hatem S, Müller FU. Overexpression of cAMP-response element modulator causes abnormal growth and development of the atrial myocardium resulting in a substrate for sustained atrial fibrillation in mice. Int J Cardiol 2011; 166:366-74. [PMID: 22093963 DOI: 10.1016/j.ijcard.2011.10.057] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 09/21/2011] [Accepted: 10/18/2011] [Indexed: 01/04/2023]
Abstract
BACKGROUND AND METHODS Atrial fibrillation (AF) is the most common cardiac arrhythmia in clinical practice. The substrate of AF is composed of a complex interplay between structural and functional changes of the atrial myocardium often preceding the occurrence of persistent AF. However, there are only few animal models reproducing the slow progression of the AF substrate to the spontaneous occurrence of the arrhythmia. Transgenic mice (TG) with cardiomyocyte-directed expression of CREM-IbΔC-X, an isoform of transcription factor CREM, develop atrial dilatation and spontaneous-onset AF. Here we tested the hypothesis that TG mice develop an arrhythmogenic substrate preceding AF using physiological and biochemical techniques. RESULTS Overexpression of CREM-IbΔC-X in young TG mice (<8weeks) led to atrial dilatation combined with distension of myocardium, elongated myocytes, little fibrosis, down-regulation of connexin 40, loss of excitability with a number of depolarized myocytes, atrial ectopies and inducibility of AF. These abnormalities continuously progressed with age resulting in interatrial conduction block, increased atrial conduction heterogeneity, leaky sarcoplasmic reticulum calcium stores and the spontaneous occurrence of paroxysmal and later persistent AF. This distinct atrial remodelling was associated with a pattern of non-regulated and up-regulated marker genes of myocardial hypertrophy and fibrosis. CONCLUSIONS Expression of CREM-IbΔC-X in TG hearts evokes abnormal growth and development of the atria preceding conduction abnormalities and altered calcium homeostasis and the development of spontaneous and persistent AF. We conclude that transcription factor CREM is an important regulator of atrial growth implicated in the development of an arrhythmogenic substrate in TG mice.
Collapse
Affiliation(s)
- Paulus Kirchhof
- Department of Cardiology and Angiology, University Hospital Münster, Germany
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
11
|
Fedonin GG, Rakhmaninova AB, Korostelev YD, Laikova ON, Gelfand MS. Machine learning study of DNA binding by transcription factors from the LacI family. Mol Biol 2011. [DOI: 10.1134/s0026893311040054] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
12
|
Tungtur S, Parente DJ, Swint-Kruse L. Functionally important positions can comprise the majority of a protein's architecture. Proteins 2011; 79:1589-608. [PMID: 21374721 DOI: 10.1002/prot.22985] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2010] [Revised: 12/08/2010] [Accepted: 12/15/2010] [Indexed: 01/13/2023]
Abstract
Concomitant with the genomic era, many bioinformatics programs have been developed to identify functionally important positions from sequence alignments of protein families. To evaluate these analyses, many have used the LacI/GalR family and determined whether positions predicted to be "important" are validated by published experiments. However, we previously noted that predictions do not identify all of the experimentally important positions present in the linker regions of these homologs. In an attempt to reconcile these differences, we corrected and expanded the LacI/GalR sequence set commonly used in sequence/function analyses. Next, a variety of analyses were carried out (1) for the entire LacI/GalR sequence set and (2) for a subset of homologs with functionally-important "YxPxxxAxxL" motifs in their linkers. This strategy was devised to determine whether predictions could be improved by knowledge-based sequence sorting and-for some analyses-did increase the number of linker positions identified. However, two functionally important linker positions were not reliably identified by any analysis. Finally, we compared the new predictions to all known experimental data for E. coli LacI and three homologous linkers. From these, we estimate that >50% of positions are important to the functions of the LacI/GalR homologs. In corollary, neutral positions might occur less frequently and might be easier to detect in sequence analyses. Although analyses have successfully guided mutations that partially exchange protein functions, a better experimental understanding of the sequence/function relationships in protein families would be helpful for uncovering the remaining rules used by nature to evolve new protein functions.
Collapse
Affiliation(s)
- Sudheer Tungtur
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, MSN 3030, Kansas City, Kansas 66160, USA
| | | | | |
Collapse
|
13
|
Mazin PV, Gelfand MS, Mironov AA, Rakhmaninova AB, Rubinov AR, Russell RB, Kalinina OV. An automated stochastic approach to the identification of the protein specificity determinants and functional subfamilies. Algorithms Mol Biol 2010; 5:29. [PMID: 20633297 PMCID: PMC2914642 DOI: 10.1186/1748-7188-5-29] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2009] [Accepted: 07/15/2010] [Indexed: 11/30/2022] Open
Abstract
Background Recent progress in sequencing and 3 D structure determination techniques stimulated development of approaches aimed at more precise annotation of proteins, that is, prediction of exact specificity to a ligand or, more broadly, to a binding partner of any kind. Results We present a method, SDPclust, for identification of protein functional subfamilies coupled with prediction of specificity-determining positions (SDPs). SDPclust predicts specificity in a phylogeny-independent stochastic manner, which allows for the correct identification of the specificity for proteins that are separated on a phylogenetic tree, but still bind the same ligand. SDPclust is implemented as a Web-server http://bioinf.fbb.msu.ru/SDPfoxWeb/ and a stand-alone Java application available from the website. Conclusions SDPclust performs a simultaneous identification of specificity determinants and specificity groups in a statistically robust and phylogeny-independent manner.
Collapse
|
14
|
Brandt BW, Feenstra KA, Heringa J. Multi-Harmony: detecting functional specificity from sequence alignment. Nucleic Acids Res 2010; 38:W35-40. [PMID: 20525785 PMCID: PMC2896201 DOI: 10.1093/nar/gkq415] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Many protein families contain sub-families with functional specialization, such as binding different ligands or being involved in different protein–protein interactions. A small number of amino acids generally determine functional specificity. The identification of these residues can aid the understanding of protein function and help finding targets for experimental analysis. Here, we present multi-Harmony, an interactive web sever for detecting sub-type-specific sites in proteins starting from a multiple sequence alignment. Combining our Sequence Harmony (SH) and multi-Relief (mR) methods in one web server allows simultaneous analysis and comparison of specificity residues; furthermore, both methods have been significantly improved and extended. SH has been extended to cope with more than two sub-groups. mR has been changed from a sampling implementation to a deterministic one, making it more consistent and user friendly. For both methods Z-scores are reported. The multi-Harmony web server produces a dynamic output page, which includes interactive connections to the Jalview and Jmol applets, thereby allowing interactive analysis of the results. Multi-Harmony is available at http://www.ibi.vu.nl/ programs/shmrwww.
Collapse
Affiliation(s)
- Bernd W Brandt
- Centre for Integrative Bioinformatics, VU University Amsterdam, De Boelelaan 1081A, 1081HV Amsterdam, The Netherlands
| | | | | |
Collapse
|
15
|
Fromer M, Shifman JM. Tradeoff between stability and multispecificity in the design of promiscuous proteins. PLoS Comput Biol 2009; 5:e1000627. [PMID: 20041208 PMCID: PMC2790338 DOI: 10.1371/journal.pcbi.1000627] [Citation(s) in RCA: 50] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2009] [Accepted: 11/24/2009] [Indexed: 12/23/2022] Open
Abstract
Natural proteins often partake in several highly specific protein-protein interactions. They are thus subject to multiple opposing forces during evolutionary selection. To be functional, such multispecific proteins need to be stable in complex with each interaction partner, and, at the same time, to maintain affinity toward all partners. How is this multispecificity acquired through natural evolution? To answer this compelling question, we study a prototypical multispecific protein, calmodulin (CaM), which has evolved to interact with hundreds of target proteins. Starting from high-resolution structures of sixteen CaM-target complexes, we employ state-of-the-art computational methods to predict a hundred CaM sequences best suited for interaction with each individual CaM target. Then, we design CaM sequences most compatible with each possible combination of two, three, and all sixteen targets simultaneously, producing almost 70,000 low energy CaM sequences. By comparing these sequences and their energies, we gain insight into how nature has managed to find the compromise between the need for favorable interaction energies and the need for multispecificity. We observe that designing for more partners simultaneously yields CaM sequences that better match natural sequence profiles, thus emphasizing the importance of such strategies in nature. Furthermore, we show that the CaM binding interface can be nicely partitioned into positions that are critical for the affinity of all CaM-target complexes and those that are molded to provide interaction specificity. We reveal several basic categories of sequence-level tradeoffs that enable the compromise necessary for the promiscuity of this protein. We also thoroughly quantify the tradeoff between interaction energetics and multispecificity and find that facilitating seemingly competing interactions requires only a small deviation from optimal energies. We conclude that multispecific proteins have been subjected to a rigorous optimization process that has fine-tuned their sequences for interactions with a precise set of targets, thus conferring their multiple cellular functions. In nature, some proteins are more social than others, interacting with a large number of partners. These “promiscuous” proteins play key roles in cellular signaling pathways whose disruption may lead to diseases such as cancer. The amino acid sequences of such proteins must have evolved to be optimal for combined interactions with all natural partners. However, the evolutionary process leading to this promiscuity is not fully understood. We address this subject by predicting amino acid sequences that would be most compatible for interaction with each partner on its own and those most compatible for binding multiple proteins. We find that these two types of sequences are substantially different, the latter more closely resembling the natural sequences of promiscuous proteins. We also find that promiscuous proteins contain certain regions that are necessary for interfacing with all of their partners, while other regions convey specific interactions with each particular target protein. We analyze the tradeoffs required for such proteins to bind multiple partners and find that only some degree of compromise is typically needed in order to permit interactions that are seemingly antagonistic. We conclude that the simulations reported here mimic well the natural evolution of proteins that associate with multiple partners.
Collapse
Affiliation(s)
- Menachem Fromer
- School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel
| | - Julia M. Shifman
- Department of Biological Chemistry, The Alexander Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, Israel
- * E-mail:
| |
Collapse
|
16
|
Chakrabarti S, Panchenko AR. Ensemble approach to predict specificity determinants: benchmarking and validation. BMC Bioinformatics 2009; 10:207. [PMID: 19573245 PMCID: PMC2716344 DOI: 10.1186/1471-2105-10-207] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2009] [Accepted: 07/02/2009] [Indexed: 11/29/2022] Open
Abstract
Background It is extremely important and challenging to identify the sites that are responsible for functional specification or diversification in protein families. In this study, a rigorous comparative benchmarking protocol was employed to provide a reliable evaluation of methods which predict the specificity determining sites. Subsequently, three best performing methods were applied to identify new potential specificity determining sites through ensemble approach and common agreement of their prediction results. Results It was shown that the analysis of structural characteristics of predicted specificity determining sites might provide the means to validate their prediction accuracy. For example, we found that for smaller distances it holds true that the more reliable the prediction method is, the closer predicted specificity determining sites are to each other and to the ligand. Conclusion We observed certain similarities of structural features between predicted and actual subsites which might point to their functional relevance. We speculate that majority of the identified potential specificity determining sites might be indirectly involved in specific interactions and could be ideal target for mutagenesis experiments.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland, USA.
| | | |
Collapse
|
17
|
Kalinina OV, Gelfand MS, Russell RB. Combining specificity determining and conserved residues improves functional site prediction. BMC Bioinformatics 2009; 10:174. [PMID: 19508719 PMCID: PMC2709924 DOI: 10.1186/1471-2105-10-174] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 06/09/2009] [Indexed: 11/16/2022] Open
Abstract
Background Predicting the location of functionally important sites from protein sequence and/or structure is a long-standing problem in computational biology. Most current approaches make use of sequence conservation, assuming that amino acid residues conserved within a protein family are most likely to be functionally important. Most often these approaches do not consider many residues that act to define specific sub-functions within a family, or they make no distinction between residues important for function and those more relevant for maintaining structure (e.g. in the hydrophobic core). Many protein families bind and/or act on a variety of ligands, meaning that conserved residues often only bind a common ligand sub-structure or perform general catalytic activities. Results Here we present a novel method for functional site prediction based on identification of conserved positions, as well as those responsible for determining ligand specificity. We define Specificity-Determining Positions (SDPs), as those occupied by conserved residues within sub-groups of proteins in a family having a common specificity, but differ between groups, and are thus likely to account for specific recognition events. We benchmark the approach on enzyme families of known 3D structure with bound substrates, and find that in nearly all families residues predicted by SDPsite are in contact with the bound substrate, and that the addition of SDPs significantly improves functional site prediction accuracy. We apply SDPsite to various families of proteins containing known three-dimensional structures, but lacking clear functional annotations, and discusse several illustrative examples. Conclusion The results suggest a better means to predict functional details for the thousands of protein structures determined prior to a clear understanding of molecular function.
Collapse
|
18
|
SDPhound, a Mutual Information-Based Method to Investigate Specificity-Determining Positions. ALGORITHMS 2009. [DOI: 10.3390/a2020764] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
|
19
|
Fatakia SN, Costanzi S, Chow CC. Computing highly correlated positions using mutual information and graph theory for G protein-coupled receptors. PLoS One 2009; 4:e4681. [PMID: 19262747 PMCID: PMC2650788 DOI: 10.1371/journal.pone.0004681] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2008] [Accepted: 01/07/2009] [Indexed: 01/06/2023] Open
Abstract
G protein-coupled receptors (GPCRs) are a superfamily of seven transmembrane-spanning proteins involved in a wide array of physiological functions and are the most common targets of pharmaceuticals. This study aims to identify a cohort or clique of positions that share high mutual information. Using a multiple sequence alignment of the transmembrane (TM) domains, we calculated the mutual information between all inter-TM pairs of aligned positions and ranked the pairs by mutual information. A mutual information graph was constructed with vertices that corresponded to TM positions and edges between vertices were drawn if the mutual information exceeded a threshold of statistical significance. Positions with high degree (i.e. had significant mutual information with a large number of other positions) were found to line a well defined inter-TM ligand binding cavity for class A as well as class C GPCRs. Although the natural ligands of class C receptors bind to their extracellular N-terminal domains, the possibility of modulating their activity through ligands that bind to their helical bundle has been reported. Such positions were not found for class B GPCRs, in agreement with the observation that there are not known ligands that bind within their TM helical bundle. All identified key positions formed a clique within the MI graph of interest. For a subset of class A receptors we also considered the alignment of a portion of the second extracellular loop, and found that the two positions adjacent to the conserved Cys that bridges the loop with the TM3 qualified as key positions. Our algorithm may be useful for localizing topologically conserved regions in other protein families.
Collapse
Affiliation(s)
- Sarosh N. Fatakia
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Stefano Costanzi
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| | - Carson C. Chow
- Laboratory of Biological Modeling, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Bethesda, Maryland, United States of America
| |
Collapse
|
20
|
Meinhardt S, Swint-Kruse L. Experimental identification of specificity determinants in the domain linker of a LacI/GalR protein: bioinformatics-based predictions generate true positives and false negatives. Proteins 2008; 73:941-57. [PMID: 18536016 DOI: 10.1002/prot.22121] [Citation(s) in RCA: 28] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
In protein families, conserved residues often contribute to a common general function, such as DNA-binding. However, unique attributes for each homolog (e.g. recognition of alternative DNA sequences) must arise from variation in other functionally-important positions. The locations of these "specificity determinant" positions are obscured amongst the background of varied residues that do not make significant contributions to either structure or function. To isolate specificity determinants, a number of bioinformatics algorithms have been developed. When applied to the LacI/GalR family of transcription regulators, several specificity determinants are predicted in the 18 amino acids that link the DNA-binding and regulatory domains. However, results from alternative algorithms are only in partial agreement with each other. Here, we experimentally evaluate these predictions using an engineered repressor comprising the LacI DNA-binding domain, the LacI linker, and the GalR regulatory domain (LLhG). "Wild-type" LLhG has altered DNA specificity and weaker lacO(1) repression compared to LacI or a similar LacI:PurR chimera. Next, predictions of linker specificity determinants were tested, using amino acid substitution and in vivo repression assays to assess functional change. In LLhG, all predicted sites are specificity determinants, as well as three sites not predicted by any algorithm. Strategies are suggested for diminishing the number of false negative predictions. Finally, individual substitutions at LLhG specificity determinants exhibited a broad range of functional changes that are not predicted by bioinformatics algorithms. Results suggest that some variants have altered affinity for DNA, some have altered allosteric response, and some appear to have changed specificity for alternative DNA ligands.
Collapse
Affiliation(s)
- Sarah Meinhardt
- Department of Biochemistry and Molecular Biology, The University of Kansas Medical Center, Kansas City, Kansas 66160, USA
| | | |
Collapse
|
21
|
Wang M, Wang Q, Zhao H, Zhang X, Pan Y. Evolutionary selection pressure of forkhead domain and functional divergence. Gene 2008; 432:19-25. [PMID: 19100316 DOI: 10.1016/j.gene.2008.11.018] [Citation(s) in RCA: 15] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2008] [Revised: 10/10/2008] [Accepted: 11/18/2008] [Indexed: 11/18/2022]
Abstract
Forkhead-box (Fox) genes encode a family of transcription factors defined by a "winged helix" DNA-binding domain which have been identified in many metazoans, and play important roles in diverse biological processes. Here we aimed to extend previous evolutionary selection analysis to fungi, using available sequences from E. cuniculi (Ec), Eremothecium gossypii (Eg), Saccharomyces cerevisiae (Sc), etc. The phylogeny of 335 Fox protein sequences was reconstructed, revealing the existence of 26 orthologous groups that were well supported by gene phylogeny which arose following a series of gene duplication events. Gene conversion events may also play important roles in the evolution of Fox genes. The nonsynonymous to synonymous substitution ratios (dN/dS) for orthologous groups suggested that after gene duplication and/or speciation of forkhead clusters, rapid differentiation and the negative selection have occurred, prompting the formation of distinct Fox subclasses and new functions. SDPpred was used to produce a set of the alignment positions (specificity determining positions) which is involved in conferring differential functional specificity. These findings explained the functional divergence of Fox gene family.
Collapse
Affiliation(s)
- Minghui Wang
- School of Agriculture and Biology, Department of Animal Sciences, Shanghai Jiao Tong University, Shanghai, 200240, PR China
| | | | | | | | | |
Collapse
|
22
|
Donald JE, Shakhnovich EI. SDR: a database of predicted specificity-determining residues in proteins. Nucleic Acids Res 2008; 37:D191-4. [PMID: 18927118 PMCID: PMC2686543 DOI: 10.1093/nar/gkn716] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023] Open
Abstract
The specificity-determining residue database (SDR database) presents residue positions where mutations are predicted to have changed protein function in large protein families. Because the database pre-calculates predictions on existing protein sequence alignments, users can quickly find the predictions by selecting the appropriate protein family or searching by protein sequence. Predictions can be used to guide mutagenesis or to gain a better understanding of specificity changes in a protein family. The database is available on the web at http://paradox.harvard.edu/sdr.
Collapse
Affiliation(s)
- Jason E Donald
- Department of Biochemistry and Biophysics, University of Pennsylvania, Philadelphia, PA, USA.
| | | |
Collapse
|
23
|
Reva B, Antipin Y, Sander C. Determinants of protein function revealed by combinatorial entropy optimization. Genome Biol 2008; 8:R232. [PMID: 17976239 PMCID: PMC2258190 DOI: 10.1186/gb-2007-8-11-r232] [Citation(s) in RCA: 232] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2007] [Accepted: 11/01/2007] [Indexed: 11/10/2022] Open
Abstract
We use a new algorithm (combinatorial entropy optimization [CEO]) to identify specificity residues and functional subfamilies in sets of proteins related by evolution. Specificity residues are conserved within a subfamily but differ between subfamilies, and they typically encode functional diversity. We obtain good agreement between predicted specificity residues and experimentally known functional residues in protein interfaces. Such predicted functional determinants are useful for interpreting the functional consequences of mutations in natural evolution and disease.
Collapse
Affiliation(s)
- Boris Reva
- Computational Biology Center, Memorial Sloan-Kettering Cancer Center, 1275 York Avenue, New York, NY 10065, USA.
| | | | | |
Collapse
|
24
|
|
25
|
Qian Z, Lu L, Qi L, Li Y. An efficient method for statistical significance calculation of transcription factor binding sites. Bioinformation 2007; 2:169-74. [PMID: 18305824 PMCID: PMC2241927 DOI: 10.6026/97320630002169] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2007] [Accepted: 12/31/2007] [Indexed: 11/23/2022] Open
Abstract
Various statistical models have been developed to describe the DNA binding preference of transcription factors, by which putative transcription factor binding sites (TFBS) can be identified according to scores assigned. Statistical significance of these scores, usually known as the p-value, play a critical role in identification. We developed an efficient algorithm to provide precise calculation of the statistical significance, remarkably enhancing the calculation efficiency by reducing the time complexity from an exponent scale to a linear scale, and successfully extended the application of this algorithm to a wide range of models, from the commonly used position weight matrix models to the complicated Bayesian Network models. Further, we calculated p-values of all transcription factor DNA binding sites recorded in the database, JASPAR, and based on these, we investigated some unseen properties of p-values as a whole, such as the p-value distribution of different models and the p-value variance according to changed scoring schemes. We hope that our algorithm and the result of computational experiments would offer an improved solution to the statistical significance of transcription factor binding sites. The software to implement our method can be downloaded from http://pcal.biosino.org/pCal.html.
Collapse
Affiliation(s)
- Ziliang Qian
- Bioinformatics Center, Key Laboratory of Molecular System Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, 320 Yueyang Road, Shanghai 200031, PR China
| | | | | | | |
Collapse
|
26
|
Mihalek I, Res I, Lichtarge O. Background frequencies for residue variability estimates: BLOSUM revisited. BMC Bioinformatics 2007; 8:488. [PMID: 18162129 PMCID: PMC2267808 DOI: 10.1186/1471-2105-8-488] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2007] [Accepted: 12/27/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Shannon entropy applied to columns of multiple sequence alignments as a score of residue conservation has proven one of the most fruitful ideas in bioinformatics. This straightforward and intuitively appealing measure clearly shows the regions of a protein under increased evolutionary pressure, highlighting their functional importance. The inability of the column entropy to differentiate between residue types, however, limits its resolution power. RESULTS In this work we suggest generalizing Shannon's expression to a function with similar mathematical properties, that, at the same time, includes observed propensities of residue types to mutate to each other. To do that, we revisit the original construction of BLOSUM matrices, and re-interpret them as mutation probability matrices. These probabilities are then used as background frequencies in the revised residue conservation measure. CONCLUSION We show that joint entropy with BLOSUM-proportional probabilities as a reference distribution enables detection of protein functional sites comparable in quality to a time-costly maximum-likelihood evolution simulation method (rate4site), and offers greater resolution than the Shannon entropy alone, in particular in the cases when the available sequences are of narrow evolutionary scope.
Collapse
Affiliation(s)
- I Mihalek
- Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, TX 77030, USA.
| | | | | |
Collapse
|
27
|
Chakrabarti S, Bryant SH, Panchenko AR. Functional specificity lies within the properties and evolutionary changes of amino acids. J Mol Biol 2007; 373:801-10. [PMID: 17868687 PMCID: PMC2605514 DOI: 10.1016/j.jmb.2007.08.036] [Citation(s) in RCA: 46] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2007] [Revised: 07/03/2007] [Accepted: 08/16/2007] [Indexed: 10/22/2022]
Abstract
The rapid increase in the amount of protein sequence data has created a need for automated identification of sites that determine functional specificity among related subfamilies of proteins. A significant fraction of subfamily specific sites are only marginally conserved, which makes it extremely challenging to detect those amino acid changes that lead to functional diversification. To address this critical problem we developed a method named SPEER (specificity prediction using amino acids' properties, entropy and evolution rate) to distinguish specificity determining sites from others. SPEER encodes the conservation patterns of amino acid types using their physico-chemical properties and the heterogeneity of evolutionary changes between and within the subfamilies. To test the method, we compiled a test set containing 13 protein families with known specificity determining sites. Extensive benchmarking by comparing the performance of SPEER with other specificity site prediction algorithms has shown that it performs better in predicting several categories of subfamily specific sites.
Collapse
Affiliation(s)
- Saikat Chakrabarti
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.
| | | | | |
Collapse
|
28
|
Feenstra KA, Pirovano W, Krab K, Heringa J. Sequence harmony: detecting functional specificity from alignments. Nucleic Acids Res 2007; 35:W495-8. [PMID: 17584793 PMCID: PMC1933219 DOI: 10.1093/nar/gkm406] [Citation(s) in RCA: 32] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Multiple sequence alignments are often used for the identification of key specificity-determining residues within protein families. We present a web server implementation of the Sequence Harmony (SH) method previously introduced. SH accurately detects subfamily specific positions from a multiple alignment by scoring compositional differences between subfamilies, without imposing conservation. The SH web server allows a quick selection of subtype specific sites from a multiple alignment given a subfamily grouping. In addition, it allows the predicted sites to be directly mapped onto a protein structure and displayed. We demonstrate the use of the SH server using the family of plant mitochondrial alternative oxidases (AOX). In addition, we illustrate the usefulness of combining sequence and structural information by showing that the predicted sites are clustered into a few distinct regions in an AOX homology model. The SH web server can be accessed at www.ibi.vu.nl/programs/seqharmwww.
Collapse
Affiliation(s)
- K. Anton Feenstra
- Centre for Integrative Bioinformatics VU (IBIVU) and Institute of Molecular Cell Biology, Vrije Universiteit Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
| | - Walter Pirovano
- Centre for Integrative Bioinformatics VU (IBIVU) and Institute of Molecular Cell Biology, Vrije Universiteit Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
| | - Klaas Krab
- Centre for Integrative Bioinformatics VU (IBIVU) and Institute of Molecular Cell Biology, Vrije Universiteit Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
| | - Jaap Heringa
- Centre for Integrative Bioinformatics VU (IBIVU) and Institute of Molecular Cell Biology, Vrije Universiteit Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
- *To whom correspondence should be addressed. +31 20 598 7649+31 20 598 7653
| |
Collapse
|
29
|
Pirovano W, Feenstra KA, Heringa J. Sequence comparison by sequence harmony identifies subtype-specific functional sites. Nucleic Acids Res 2006; 34:6540-8. [PMID: 17130172 PMCID: PMC1702503 DOI: 10.1093/nar/gkl901] [Citation(s) in RCA: 59] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Multiple sequence alignments are often used to reveal functionally important residues within a protein family. They can be particularly useful for the identification of key residues that determine functional differences between protein subfamilies. We present a new entropy-based method, Sequence Harmony (SH) that accurately detects subfamily-specific positions from a multiple sequence alignment. The SH algorithm implements a novel formula, able to score compositional differences between subfamilies, without imposing conservation, in a simple manner on an intuitive scale. We compare our method with the most important published methods, i.e. AMAS, TreeDet and SDP-pred, using three well-studied protein families: the receptor-binding domain (MH2) of the Smad family of transcription factors, the Ras-superfamily of small GTPases and the MIP-family of integral membrane transporters. We demonstrate that SH accurately selects known functional sites with higher coverage than the other methods for these test-cases. This shows that compositional differences between protein subfamilies provide sufficient basis for identification of functional sites. In addition, SH selects a number of sites of unknown function that could be interesting candidates for further experimental investigation.
Collapse
Affiliation(s)
| | | | - Jaap Heringa
- To whom correspondence should be addressed. Tel: +31 20 59 87649; Fax: +31 20 59 87653;
| |
Collapse
|
30
|
Kalinina OV, Gelfand MS. Amino acid residues that determine functional specificity of NADP- and NAD-dependent isocitrate and isopropylmalate dehydrogenases. Proteins 2006; 64:1001-9. [PMID: 16767773 DOI: 10.1002/prot.21027] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
Isocitrate and isopropylmalalte dehydrogenases are homologous enzymes important for the cell metabolism. They oxidize their substrates using NAD or NADP as cofactors. Thus, they have two specificities, towards the substrate and the cofactor, appearing in three combinations. Although many three-dimensional (3D) structures are resolved, identification of amino acids determining these specificities remains a challenge. We present computational identification and analysis of specificity-determining positions (SDPs). Besides many experimentally proven SDPs, we predict new SDPs, for example, four substrate-specific positions (103Leu, 105Thr, 337Ala, and 341Thr in IDH from E. coli) that contact the cofactor and may play a role in the recognition process.
Collapse
Affiliation(s)
- Olga V Kalinina
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow, Russia
| | | |
Collapse
|
31
|
Marttinen P, Corander J, Törönen P, Holm L. Bayesian search of functionally divergent protein subgroups and their function specific residues. Bioinformatics 2006; 22:2466-74. [PMID: 16870932 DOI: 10.1093/bioinformatics/btl411] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION The rapid increase in the amount of protein sequence data has created a need for an automated identification of evolutionarily related subgroups from large datasets. The existing methods typically require a priori specification of the number of putative groups, which defines the resolution of the classification solution. RESULTS We introduce a Bayesian model-based approach to simultaneous identification of evolutionary groups and conserved parts of the protein sequences. The model-based approach provides an intuitive and efficient way of determining the number of groups from the sequence data, in contrast to the ad hoc methods often exploited for similar purposes. Our model recognizes the areas in the sequences that are relevant for the clustering and regards other areas as noise. We have implemented the method using a fast stochastic optimization algorithm which yields a clustering associated with the estimated maximum posterior probability. The method has been shown to have high specificity and sensitivity in simulated and real clustering tasks. With real datasets the method also highlights the residues close to the active site. AVAILABILITY Software 'kPax' is available at http://www.rni.helsinki.fi/jic/softa.html
Collapse
Affiliation(s)
- Pekka Marttinen
- Department of Mathematics and Statistics, PO Box 68, 00014 University of Helsinki, Finland.
| | | | | | | |
Collapse
|