51
|
Schuster-Böckler B, Bateman A. Reuse of structural domain-domain interactions in protein networks. BMC Bioinformatics 2007; 8:259. [PMID: 17640363 PMCID: PMC1940023 DOI: 10.1186/1471-2105-8-259] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2007] [Accepted: 07/18/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Protein interactions are thought to be largely mediated by interactions between structural domains. Databases such as iPfam relate interactions in protein structures to known domain families. Here, we investigate how the domain interactions from the iPfam database are distributed in protein interactions taken from the HPRD, MPact, BioGRID, DIP and IntAct databases. RESULTS We find that known structural domain interactions can only explain a subset of 4-19% of the available protein interactions, nevertheless this fraction is still significantly bigger than expected by chance. There is a correlation between the frequency of a domain interaction and the connectivity of the proteins it occurs in. Furthermore, a large proportion of protein interactions can be attributed to a small number of domain interactions. We conclude that many, but not all, domain interactions constitute reusable modules of molecular recognition. A substantial proportion of domain interactions are conserved between E. coli, S. cerevisiae and H. sapiens. These domains are related to essential cellular functions, suggesting that many domain interactions were already present in the last universal common ancestor. CONCLUSION Our results support the concept of domain interactions as reusable, conserved building blocks of protein interactions, but also highlight the limitations currently imposed by the small number of available protein structures.
Collapse
Affiliation(s)
| | - Alex Bateman
- Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, UK
| |
Collapse
|
52
|
Jothi R, Przytycka TM, Aravind L. Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics 2007; 8:173. [PMID: 17521444 PMCID: PMC1904249 DOI: 10.1186/1471-2105-8-173] [Citation(s) in RCA: 69] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2007] [Accepted: 05/23/2007] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. RESULTS Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. CONCLUSION Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - Teresa M Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| | - L Aravind
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
53
|
Shoemaker BA, Panchenko AR. Deciphering protein-protein interactions. Part II. Computational methods to predict protein and domain interaction partners. PLoS Comput Biol 2007; 3:e43. [PMID: 17465672 PMCID: PMC1857810 DOI: 10.1371/journal.pcbi.0030043] [Citation(s) in RCA: 212] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
Abstract
Recent advances in high-throughput experimental methods for the identification of protein interactions have resulted in a large amount of diverse data that are somewhat incomplete and contradictory. As valuable as they are, such experimental approaches studying protein interactomes have certain limitations that can be complemented by the computational methods for predicting protein interactions. In this review we describe different approaches to predict protein interaction partners as well as highlight recent achievements in the prediction of specific domains mediating protein-protein interactions. We discuss the applicability of computational methods to different types of prediction problems and point out limitations common to all of them.
Collapse
|
54
|
Kufareva I, Budagyan L, Raush E, Totrov M, Abagyan R. PIER: protein interface recognition for structural proteomics. Proteins 2007; 67:400-17. [PMID: 17299750 DOI: 10.1002/prot.21233] [Citation(s) in RCA: 97] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Recent advances in structural proteomics call for development of fast and reliable automatic methods for prediction of functional surfaces of proteins with known three-dimensional structure, including binding sites for known and unknown protein partners as well as oligomerization interfaces. Despite significant progress the problem is still far from being solved. Most existing methods rely, at least partially, on evolutionary information from multiple sequence alignments projected on protein surface. The common drawback of such methods is their limited applicability to the proteins with a sparse set of sequential homologs, as well as inability to detect interfaces in evolutionary variable regions. In this study, the authors developed an improved method for predicting interfaces from a single protein structure, which is based on local statistical properties of the protein surface derived at the level of atomic groups. The proposed Protein IntErface Recognition (PIER) method achieved the overall precision of 60% at the recall threshold of 50% at the residue level on a diverse benchmark of 490 homodimeric, 62 heterodimeric, and 196 transient interfaces (compared with 25% precision at 50% recall expected from random residue function assignment). For 70% of proteins in the benchmark, the binding patch residues were successfully detected with precision exceeding 50% at 50% recall. The calculation only took seconds for an average 300-residue protein. The authors demonstrated that adding the evolutionary conservation signal only marginally influenced the overall prediction performance on the benchmark; moreover, for certain classes of proteins, using this signal actually resulted in a deteriorated prediction. Thorough benchmarking using other datasets from literature showed that PIER yielded improved performance as compared with several alignment-free or alignment-dependent predictions. The accuracy, efficiency, and dependence on structure alone make PIER a suitable tool for automated high-throughput annotation of protein structures emerging from structural proteomics projects.
Collapse
Affiliation(s)
- Irina Kufareva
- Scripps Research Institute, La Jolla, California 92037, USA
| | | | | | | | | |
Collapse
|
55
|
Barker D, Meade A, Pagel M. Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. ACTA ACUST UNITED AC 2006; 23:14-20. [PMID: 17090580 DOI: 10.1093/bioinformatics/btl558] [Citation(s) in RCA: 79] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
MOTIVATION We compare phylogenetic approaches for inferring functional gene links. The approaches detect independent instances of the correlated gain and loss of pairs of genes from species' genomes. We investigate the effect on results of basing evidence of correlations on two phylogenetic approaches, Dollo parsminony and maximum likelihood (ML). We further examine the effect of constraining the ML model by fixing the rate of gene gain at a low value, rather than estimating it from the data. RESULTS We detect correlated evolution among a test set of pairs of yeast (Saccharomyces cerevisiae) genes, with a case study of 21 eukaryotic genomes and test data derived from known yeast protein complexes. If the rate at which genes are gained is constrained to be low, ML achieves by far the best results at detecting known functional links. The model then has fewer parameters but it is more realistic by preventing genes from being gained more than once. AVAILABILITY BayesTraits by M. Pagel and A. Meade, and a script to configure and repeatedly launch it by D. Barker and M. Pagel, are available at http://www.evolution.reading.ac.uk
Collapse
Affiliation(s)
- Daniel Barker
- Sir Harold Mitchell Building, School of Biology, University of St Andrews St Andrews, Fife, KY16 9TH, UK
| | | | | |
Collapse
|
56
|
Snitkin ES, Gustafson AM, Mellor J, Wu J, DeLisi C. Comparative assessment of performance and genome dependence among phylogenetic profiling methods. BMC Bioinformatics 2006; 7:420. [PMID: 17005048 PMCID: PMC1592128 DOI: 10.1186/1471-2105-7-420] [Citation(s) in RCA: 41] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2006] [Accepted: 09/27/2006] [Indexed: 11/10/2022] Open
Abstract
Background The rapidly increasing speed with which genome sequence data can be generated will be accompanied by an exponential increase in the number of sequenced eukaryotes. With the increasing number of sequenced eukaryotic genomes comes a need for bioinformatic techniques to aid in functional annotation. Ideally, genome context based techniques such as proximity, fusion, and phylogenetic profiling, which have been so successful in prokaryotes, could be utilized in eukaryotes. Here we explore the application of phylogenetic profiling, a method that exploits the evolutionary co-occurrence of genes in the assignment of functional linkages, to eukaryotic genomes. Results In order to evaluate the performance of phylogenetic profiling in eukaryotes, we assessed the relative performance of commonly used profile construction techniques and genome compositions in predicting functional linkages in both prokaryotic and eukaryotic organisms. When predicting linkages in E. coli with a prokaryotic profile, the use of continuous values constructed from transformed BLAST bit-scores performed better than profiles composed of discretized E-values; the use of discretized E-values resulted in more accurate linkages when using S. cerevisiae as the query organism. Extending this analysis by incorporating several eukaryotic genomes in profiles containing a majority of prokaryotes resulted in similar overall accuracy, but with a surprising reduction in pathway diversity among the most significant linkages. Furthermore, the application of phylogenetic profiling using profiles composed of only eukaryotes resulted in the loss of the strong correlation between common KEGG pathway membership and profile similarity score. Profile construction methods, orthology definitions, ontology and domain complexity were explored as possible sources of the poor performance of eukaryotic profiles, but with no improvement in results. Conclusion Given the current set of completely sequenced eukaryotic organisms, phylogenetic profiling using profiles generated from any of the commonly used techniques was found to yield extremely poor results. These findings imply genome-specific requirements for constructing functionally relevant phylogenetic profiles, and suggest that differences in the evolutionary history between different kingdoms might generally limit the usefulness of phylogenetic profiling in eukaryotes.
Collapse
Affiliation(s)
- Evan S Snitkin
- Bioinformatics Graduate Program, Boston University, Boston, USA
| | | | - Joseph Mellor
- Bioinformatics Graduate Program, Boston University, Boston, USA
| | - Jie Wu
- Department of Biomedical Engineering, Boston University, Boston, USA
| | - Charles DeLisi
- Bioinformatics Graduate Program, Boston University, Boston, USA
- Department of Biomedical Engineering, Boston University, Boston, USA
| |
Collapse
|
57
|
Jothi R, Cherukuri PF, Tasneem A, Przytycka TM. Co-evolutionary analysis of domains in interacting proteins reveals insights into domain-domain interactions mediating protein-protein interactions. J Mol Biol 2006; 362:861-75. [PMID: 16949097 PMCID: PMC1618801 DOI: 10.1016/j.jmb.2006.07.072] [Citation(s) in RCA: 81] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2006] [Revised: 06/19/2006] [Accepted: 07/14/2006] [Indexed: 11/28/2022]
Abstract
Recent advances in functional genomics have helped generate large-scale high-throughput protein interaction data. Such networks, though extremely valuable towards molecular level understanding of cells, do not provide any direct information about the regions (domains) in the proteins that mediate the interaction. Here, we performed co-evolutionary analysis of domains in interacting proteins in order to understand the degree of co-evolution of interacting and non-interacting domains. Using a combination of sequence and structural analysis, we analyzed protein-protein interactions in F1-ATPase, Sec23p/Sec24p, DNA-directed RNA polymerase and nuclear pore complexes, and found that interacting domain pair(s) for a given interaction exhibits higher level of co-evolution than the non-interacting domain pairs. Motivated by this finding, we developed a computational method to test the generality of the observed trend, and to predict large-scale domain-domain interactions. Given a protein-protein interaction, the proposed method predicts the domain pair(s) that is most likely to mediate the protein interaction. We applied this method on the yeast interactome to predict domain-domain interactions, and used known domain-domain interactions found in PDB crystal structures to validate our predictions. Our results show that the prediction accuracy of the proposed method is statistically significant. Comparison of our prediction results with those from two other methods reveals that only a fraction of predictions are shared by all the three methods, indicating that the proposed method can detect known interactions missed by other methods. We believe that the proposed method can be used with other methods to help identify previously unrecognized domain-domain interactions on a genome scale, and could potentially help reduce the search space for identifying interaction sites.
Collapse
Affiliation(s)
- Raja Jothi
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *Corresponding authors; E-mail addresses of the corresponding authors: ;
| | - Praveen F. Cherukuri
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- Bioinformatics Program Boston University, Boston, MA 02215, USA
| | - Asba Tasneem
- Booz Allen Hamilton Inc., Rockville, MD 20852, USA
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
- *Corresponding authors; E-mail addresses of the corresponding authors: ;
| |
Collapse
|
58
|
Boocock GRB, Marit MR, Rommens JM. Phylogeny, sequence conservation, and functional complementation of the SBDS protein family. Genomics 2006; 87:758-71. [PMID: 16529906 DOI: 10.1016/j.ygeno.2006.01.010] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2005] [Revised: 01/16/2006] [Accepted: 01/31/2006] [Indexed: 11/16/2022]
Abstract
The Shwachman-Bodian-Diamond syndrome (SBDS) protein family occurs widely in nature, although its function has not been determined. Comprehensive database searches revealed SBDS homologues from 159 species, including examples from all sequenced archaeal and eukaryotic genomes and all eukaryotic kingdoms. Sequence alignment with ClustalX and MUSCLE algorithms led to the identification of conserved residues that occurred predominantly in the amino-terminal FYSH domain where they appeared to contribute to protein folding or stability. Only SBDS residue Gly91 was invariant in all species. Four distantly related protists were found to have two divergent SBDS genes in their genomes. In each case, phylogenetic analyses and the identification of shared sequence features suggested that one gene was derived from lateral gene transfer. We also identified a shared C-terminal zinc finger domain fusion in flowering plants and chromalveolates that may shed light on the function of the protein family and the evolutionary histories of these kingdoms. To assess the extent of SBDS functional conservation, we carried out complementation studies of SBDS homologues and interspecies chimeras in Saccharomyces cerevisiae. We determined that the FYSH domain was widely interchangeable among eukaryotes, while domain 2 imparted species specificity to protein function. Domain 3 was largely dispensable for function in our yeast complementation assay. Overall, the phylogeny of SBDS was shared with a group of proteins that were markedly enriched for RNA metabolism and/or ribosome-associated functions. These findings link Shwachman-Diamond syndrome to other bone marrow failure syndromes with defects in nucleolus-associated processes, including Diamond-Blackfan anemia, cartilage-hair hypoplasia, and dyskeratosis congenita.
Collapse
Affiliation(s)
- G R B Boocock
- Program in Genetics and Genomic Biology, The Hospital for Sick Children, 101 College Street, East Tower, Toronto, Canada ON M5G 1L7
| | | | | |
Collapse
|
59
|
Pagel P, Oesterheld M, Stümpflen V, Frishman D. The DIMA web resource--exploring the protein domain network. Bioinformatics 2006; 22:997-8. [PMID: 16481337 DOI: 10.1093/bioinformatics/btl050] [Citation(s) in RCA: 16] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
UNLABELLED Conserved domains represent essential building blocks of most known proteins. Owing to their role as modular components carrying out specific functions they form a network based both on functional relations and direct physical interactions. We have previously shown that domain interaction networks provide substantially novel information with respect to networks built on full-length protein chains. In this work we present a comprehensive web resource for exploring the Domain Interaction MAp (DIMA), interactively. The tool aims at integration of multiple data sources and prediction techniques, two of which have been implemented so far: domain phylogenetic profiling and experimentally demonstrated domain contacts from known three-dimensional structures. A powerful yet simple user interface enables the user to compute, visualize, navigate and download domain networks based on specific search criteria. AVAILABILITY http://mips.gsf.de/genre/proj/dima
Collapse
Affiliation(s)
- Philipp Pagel
- Department of Genome Oriented Bioinformatics, Technical University of Munich, Wissenschaftszentrum Weihenstephan, 85350 Freising, Germany.
| | | | | | | |
Collapse
|
60
|
Kim Y, Koyutürk M, Topkara U, Grama A, Subramaniam S. Inferring functional information from domain co-evolution. Bioinformatics 2005; 22:40-9. [PMID: 16301205 DOI: 10.1093/bioinformatics/bti723] [Citation(s) in RCA: 18] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
MOTIVATION Co-evolution is a powerful mechanism for understanding protein function. Prior work in this area has shown that co-evolving proteins are more likely to share the same function than those that do not because of functional constraints. Many of the efforts founded on this observation, however, are at the level of entire sequences, implicitly assuming that the complete protein sequence follows a single evolutionary trajectory. Since it is well known that a domain can exist in various contexts, this assumption is not valid for numerous multi-domain proteins. Motivated by these observations, we introduce a novel technique called Coevolutionary-Matrix that captures co-evolution between regions of two proteins. Instead of using existing domain information, the method exploits residue-level conservation to identify co-evolving regions that might correspond to domains. RESULTS We show that the Coevolutionary-Matrix method can detect greater number of known functional associations for the Escherichia coli proteins when compared with earlier implementations of phylogenetic profiles. Furthermore, co-evolving regions of proteins detected by our method enable us to make hypotheses about their specific functions, many of which are supported by existing biochemical studies.
Collapse
Affiliation(s)
- Yohan Kim
- Department of Chemistry and Biochemistry, University of California at San Diego, La Jolla, CA 92093, USA
| | | | | | | | | |
Collapse
|