51
|
Pu S, Turinsky AL, Vlasblom J, On T, Xiong X, Emili A, Zhang Z, Greenblatt J, Parkinson J, Wodak SJ. Expanding the landscape of chromatin modification (CM)-related functional domains and genes in human. PLoS One 2010; 5:e14122. [PMID: 21124763 PMCID: PMC2993927 DOI: 10.1371/journal.pone.0014122] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2010] [Accepted: 10/26/2010] [Indexed: 01/06/2023] Open
Abstract
Chromatin modification (CM) plays a key role in regulating transcription, DNA replication, repair and recombination. However, our knowledge of these processes in humans remains very limited. Here we use computational approaches to study proteins and functional domains involved in CM in humans. We analyze the abundance and the pair-wise domain-domain co-occurrences of 25 well-documented CM domains in 5 model organisms: yeast, worm, fly, mouse and human. Results show that domains involved in histone methylation, DNA methylation, and histone variants are remarkably expanded in metazoan, reflecting the increased demand for cell type-specific gene regulation. We find that CM domains tend to co-occur with a limited number of partner domains and are hence not promiscuous. This property is exploited to identify 47 potentially novel CM domains, including 24 DNA-binding domains, whose role in CM has received little attention so far. Lastly, we use a consensus Machine Learning approach to predict 379 novel CM genes (coding for 329 proteins) in humans based on domain compositions. Several of these predictions are supported by very recent experimental studies and others are slated for experimental verification. Identification of novel CM genes and domains in humans will aid our understanding of fundamental epigenetic processes that are important for stem cell differentiation and cancer biology. Information on all the candidate CM domains and genes reported here is publicly available.
Collapse
|
52
|
On T, Xiong X, Pu S, Turinsky A, Gong Y, Emili A, Zhang Z, Greenblatt J, Wodak SJ, Parkinson J. The evolutionary landscape of the chromatin modification machinery reveals lineage specific gains, expansions, and losses. Proteins 2010; 78:2075-89. [PMID: 20455264 DOI: 10.1002/prot.22723] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Model organisms such as yeast, fly, and worm have played a defining role in the study of many biological systems. A significant challenge remains in translating this information to humans. Of critical importance is the ability to differentiate those components where knowledge of function and interactions may be reliably inferred from those that represent lineage-specific innovations. To address this challenge, we use chromatin modification (CM) as a model system for exploring the evolutionary properties of their components in the context of their known functions and interactions. Collating previously identified components of CM from yeast, worm, fly, and human, we identified a "core" set of 50 CM genes displaying consistent orthologous relationships that likely retain their interactions and functions across taxa. In addition, we catalog many components that demonstrate lineage specific expansions and losses, highlighting much duplication within vertebrates that may reflect an expanded repertoire of regulatory mechanisms. Placed in the context of a high-quality protein-protein interaction network, we find, contrary to existing views of evolutionary modularity, that CM complex components display a mosaic of evolutionary histories: a core set of highly conserved genes, together with sets displaying lineage specific innovations. Although focused on CM, this study provides a template for differentiating those genes which are likely to retain their functions and interactions across species. As such, in addition to informing on the evolution of CM as a system, this study provides a set of comparative genomic approaches that can be generally applied to any biological systems.
Collapse
|
53
|
Turner B, Razick S, Turinsky AL, Vlasblom J, Crowdy EK, Cho E, Morrison K, Donaldson IM, Wodak SJ. iRefWeb: interactive analysis of consolidated protein interaction data and their supporting evidence. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2010; 2010:baq023. [PMID: 20940177 PMCID: PMC2963317 DOI: 10.1093/database/baq023] [Citation(s) in RCA: 146] [Impact Index Per Article: 10.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
We present iRefWeb, a web interface to protein interaction data consolidated from 10 public databases: BIND, BioGRID, CORUM, DIP, IntAct, HPRD, MINT, MPact, MPPI and OPHID. iRefWeb enables users to examine aggregated interactions for a protein of interest, and presents various statistical summaries of the data across databases, such as the number of organism-specific interactions, proteins and cited publications. Through links to source databases and supporting evidence, researchers may gauge the reliability of an interaction using simple criteria, such as the detection methods, the scale of the study (high- or low-throughput) or the number of cited publications. Furthermore, iRefWeb compares the information extracted from the same publication by different databases, and offers means to follow-up possible inconsistencies. We provide an overview of the consolidated protein–protein interaction landscape and show how it can be automatically cropped to aid the generation of meaningful organism-specific interactomes. iRefWeb can be accessed at: http://wodaklab.org/iRefWeb. Database URL: http://wodaklab.org/iRefWeb/
Collapse
|
54
|
Lensink MF, Wodak SJ. Blind predictions of protein interfaces by docking calculations in CAPRI. Proteins 2010; 78:3085-95. [DOI: 10.1002/prot.22850] [Citation(s) in RCA: 73] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022]
|
55
|
Santos MA, Turinsky AL, Ong S, Tsai J, Berger MF, Badis G, Talukder S, Gehrke AR, Bulyk ML, Hughes TR, Wodak SJ. Objective sequence-based subfamily classifications of mouse homeodomains reflect their in vitro DNA-binding preferences. Nucleic Acids Res 2010; 38:7927-42. [PMID: 20705649 PMCID: PMC3001082 DOI: 10.1093/nar/gkq714] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/01/2022] Open
Abstract
Classifying proteins into subgroups with similar molecular function on the basis of sequence is an important step in deriving reliable functional annotations computationally. So far, however, available classification procedures have been evaluated against protein subgroups that are defined by experts using mainly qualitative descriptions of molecular function. Recently, in vitro DNA-binding preferences to all possible 8-nt DNA sequences have been measured for 178 mouse homeodomains using protein-binding microarrays, offering the unprecedented opportunity of evaluating the classification methods against quantitative measures of molecular function. To this end, we automatically derive homeodomain subtypes from the DNA-binding data and independently group the same domains using sequence information alone. We test five sequence-based methods, which use different sequence-similarity measures and algorithms to group sequences. Results show that methods that optimize the classification robustness reflect well the detailed functional specificity revealed by the experimental data. In some of these classifications, 73–83% of the subfamilies exactly correspond to, or are completely contained in, the function-based subtypes. Our findings demonstrate that certain sequence-based classifications are capable of yielding very specific molecular function annotations. The availability of quantitative descriptions of molecular function, such as DNA-binding data, will be a key factor in exploiting this potential in the future.
Collapse
|
56
|
Wodak SJ, Janin J. Analytical approximation to the accessible surface area of proteins. Proc Natl Acad Sci U S A 2010; 77:1736-40. [PMID: 16592793 PMCID: PMC348579 DOI: 10.1073/pnas.77.4.1736] [Citation(s) in RCA: 132] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
We propose an analytical substitute to the geometrical construction that is commonly used in calculating the protein surface area that is accessible to the solvent. A statistical approach leads to an expression of accessible surface areas as a function of distances between pairs of atoms or of residues in the protein structure, assuming only that these atoms or residues are randomly distributed in space but not penetrating each other. This function gives good estimates of the accessible surface area and of the area buried in subunit contacts for a number of proteins. Its evaluation is very fast, and the function can be differentiated, which opens the way to new applications of accessibility measurements in the study of proteins. As an example, we show that the presence of domains is easily detected by an automatic procedure based on surface areas only.
Collapse
|
57
|
Vlasblom J, Wodak SJ. Markov clustering versus affinity propagation for the partitioning of protein interaction graphs. BMC Bioinformatics 2009; 10:99. [PMID: 19331680 PMCID: PMC2682798 DOI: 10.1186/1471-2105-10-99] [Citation(s) in RCA: 163] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Accepted: 03/30/2009] [Indexed: 11/22/2022] Open
Abstract
Background Genome scale data on protein interactions are generally represented as large networks, or graphs, where hundreds or thousands of proteins are linked to one another. Since proteins tend to function in groups, or complexes, an important goal has been to reliably identify protein complexes from these graphs. This task is commonly executed using clustering procedures, which aim at detecting densely connected regions within the interaction graphs. There exists a wealth of clustering algorithms, some of which have been applied to this problem. One of the most successful clustering procedures in this context has been the Markov Cluster algorithm (MCL), which was recently shown to outperform a number of other procedures, some of which were specifically designed for partitioning protein interactions graphs. A novel promising clustering procedure termed Affinity Propagation (AP) was recently shown to be particularly effective, and much faster than other methods for a variety of problems, but has not yet been applied to partition protein interaction graphs. Results In this work we compare the performance of the Affinity Propagation (AP) and Markov Clustering (MCL) procedures. To this end we derive an unweighted network of protein-protein interactions from a set of 408 protein complexes from S. cervisiae hand curated in-house, and evaluate the performance of the two clustering algorithms in recalling the annotated complexes. In doing so the parameter space of each algorithm is sampled in order to select optimal values for these parameters, and the robustness of the algorithms is assessed by quantifying the level of complex recall as interactions are randomly added or removed to the network to simulate noise. To evaluate the performance on a weighted protein interaction graph, we also apply the two algorithms to the consolidated protein interaction network of S. cerevisiae, derived from genome scale purification experiments and to versions of this network in which varying proportions of the links have been randomly shuffled. Conclusion Our analysis shows that the MCL procedure is significantly more tolerant to noise and behaves more robustly than the AP algorithm. The advantage of MCL over AP is dramatic for unweighted protein interaction graphs, as AP displays severe convergence problems on the majority of the unweighted graph versions that we tested, whereas MCL continues to identify meaningful clusters, albeit fewer of them, as the level of noise in the graph increases. MCL thus remains the method of choice for identifying protein complexes from binary interaction networks.
Collapse
|
58
|
Wodak SJ, Pu S, Vlasblom J, Seéraphin B. Challenges and Rewards of Interaction Proteomics. Mol Cell Proteomics 2009; 8:3-18. [DOI: 10.1074/mcp.r800014-mcp200] [Citation(s) in RCA: 75] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
|
59
|
Pu S, Wong J, Turner B, Cho E, Wodak SJ. Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res 2008; 37:825-31. [PMID: 19095691 PMCID: PMC2647312 DOI: 10.1093/nar/gkn1005] [Citation(s) in RCA: 450] [Impact Index Per Article: 28.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Gold standard datasets on protein complexes are key to inferring and validating protein-protein interactions. Despite much progress in characterizing protein complexes in the yeast Saccharomyces cerevisiae, numerous researchers still use as reference the manually curated complexes catalogued by the Munich Information Center of Protein Sequences database. Although this catalogue has served the community extremely well, it no longer reflects the current state of knowledge. Here, we report two catalogues of yeast protein complexes as results of systematic curation efforts. The first one, denoted as CYC2008, is a comprehensive catalogue of 408 manually curated heteromeric protein complexes reliably backed by small-scale experiments reported in the current literature. This catalogue represents an up-to-date reference set for biologists interested in discovering protein interactions and protein complexes. The second catalogue, denoted as YHTP2008, comprises 400 high-throughput complexes annotated with current literature evidence. Among them, 262 correspond, at least partially, to CYC2008 complexes. Evidence for interacting subunits is collected for 68 complexes that have only partial or no overlap with CYC2008 complexes, whereas no literature evidence was found for 100 complexes. Some of these partially supported and as yet unsupported complexes may be interesting candidates for experimental follow up. Both catalogues are freely available at: http://wodaklab.org/cyc2008/.
Collapse
|
60
|
Pu S, Ronen K, Vlasblom J, Greenblatt J, Wodak SJ. Local coherence in genetic interaction patterns reveals prevalent functional versatility. ACTA ACUST UNITED AC 2008; 24:2376-83. [PMID: 18718945 DOI: 10.1093/bioinformatics/btn440] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/04/2023]
Abstract
MOTIVATION Epistatic or genetic interactions, representing the effects of mutating one gene on the phenotypes caused by mutations in one or more distinct genes, can be very helpful for uncovering functional relationships between genes. Recently, the epistatic miniarray profiles (E-MAP) method has emerged as a powerful approach for identifying such interactions systematically. For E-MAP data analysis, hierarchical clustering is used to partition genes into groups on the basis of the similarity between their global interaction profiles, and the resulting descriptions assign each gene to only one group, thereby ignoring the multifunctional roles played by most genes. RESULTS Here, we present the original local coherence detection (LCD) algorithm for identifying groups of functionally related genes from E-MAP data in a manner that allows individual genes to be assigned to more than one functional group. This enables investigation of the pleiotropic nature of gene function. The performance of our algorithm is illustrated by applying it to two E-MAP datasets and an E-MAP-like in silico dataset for the yeast Saccharomyces cerevisiae. In addition to recapitulating the majority of the functional modules and many protein complexes reported previously, our algorithm uncovers many recently documented and novel multifunctional relationships between genes and gene groups. Our algorithm hence represents a valuable tool for uncovering new roles for genes with annotated functions and for mapping groups of genes and proteins into pathways.
Collapse
|
61
|
Malevanets A, Sirota FL, Wodak SJ. Mechanism and energy landscape of domain swapping in the B1 domain of protein G. J Mol Biol 2008; 382:223-35. [PMID: 18588900 DOI: 10.1016/j.jmb.2008.06.025] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2008] [Revised: 06/05/2008] [Accepted: 06/06/2008] [Indexed: 10/21/2022]
Abstract
Three-dimensional domain swapping has emerged as a ubiquitous process for homo-oligomer formation in many unrelated proteins, but the molecular mechanism of this process is still poorly understood. Here we present a mechanism for the swapping reaction in the B1 domain of the immunoglobulin G binding protein from group G of Streptococcus (GB1). This is a particularly attractive system for investigating the swapping process, as the swapped dimer formed by the quadruple mutant (L5V/F30V/Y33F/A34F) of GB1 was recently shown to exist in equilibrium with a monomer-like conformation over time scales of minutes. According to our mechanism, swapping in GB1 starts from the C-terminus of the polypeptide chain and progresses by exchanging an increasing portion of the chains until a stable conformational state is reached. This exchange process does not involve unfolding. Rather, the conformational changes of individual monomers and their association are tightly coupled to minimize solvent exposure and maximize the total number of native contacts at all times, thereby closely approximating the minimum energy path of the reaction. Using detailed atomic descriptions, we compute the complete free-energy profiles of the exchange reaction for the GB1 quadruple mutant that forms swapped dimers and for the wild-type protein, which is monomeric. In both GB1 forms, intermediates sample a surprisingly wide range of nearly isoenergetic association modes and hinge conformations, indicating that the exchange reaction is a non-specific process akin to encounter complex formation where the amino acid sequence plays a marginal role. The main role of the mutations in the swapping process is to destabilize the GB1 monomer state, while stabilizing the swapped dimer conformation, with non-native intersubunit interactions, fostered by mutant side chains, contributing significantly to this stabilization. Our findings are rationalized in terms of a generic swapping mechanism that involves the association of activated molecular species, and it is argued that a similar mechanism may apply to swapping in other protein systems.
Collapse
|
62
|
Roca M, De Maria L, Wodak SJ, Moliner V, Tuñón I, Giraldo J. Coupling of the guanosine glycosidic bond conformation and the ribonucleotide cleavage reaction: implications for barnase catalysis. Proteins 2008; 70:415-28. [PMID: 17680698 DOI: 10.1002/prot.21573] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
To examine the possible relationship of guanine-dependent GpA conformations with ribonucleotide cleavage, two potential of mean force (PMF) calculations were performed in aqueous solution. In the first calculation, the guanosine glycosidic (Gchi) angle was used as the reaction coordinate, and computations were performed on two GpA ionic species: protonated (neutral) or deprotonated (negatively charged) guanosine ribose O2 '. Similar energetic profiles featuring two minima corresponding to the anti and syn Gchi regions were obtained for both ionic forms. For both forms the anti conformation was more stable than the syn, and barriers of approximately 4 kcal/mol were obtained for the anti --> syn transition. Structural analysis showed a remarkable sensitivity of the phosphate moiety to the conformation of the Gchi angle, suggesting a possible connection between this conformation and the mechanism of ribonucleotide cleavage. This hypothesis was confirmed by the second PMF calculations, for which the O2 '--P distance for the deprotonated GpA was used as reaction coordinate. The computations were performed from two selected starting points: the anti and syn minima determined in the first PMF study of the deprotonated guanosine ribose O2'. The simulations revealed that the O2 ' attack along the syn Gchi was more favorable than that along the anti Gchi: energetically, significantly lower barriers were obtained in the syn than in the anti conformation for the O--P bond formation; structurally, a lesser O2 '--P initial distance, and a better suited orientation for an in-line attack was observed in the syn relative to the anti conformation. These results are consistent with the catalytically competent conformation of barnase-ribonucleotide complex, which requires a guanine syn conformation of the substrate to enable abstraction of the ribose H2 ' proton by the general base Glu73, thereby suggesting a coupling between the reactive substrate conformation and enzyme structure and mechanism.
Collapse
|
63
|
Sirota FL, Héry-Huynh S, Maurer-Stroh S, Wodak SJ. Role of the amino acid sequence in domain swapping of the B1 domain of protein G. Proteins 2008; 72:88-104. [DOI: 10.1002/prot.21901] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
|
64
|
Wodak SJ. From the Mediterranean coast to the shores of Lake Ontario: CAPRI's premiere on the American continent. Proteins 2008; 69:697-8. [PMID: 17912754 DOI: 10.1002/prot.21805] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
|
65
|
Dessailly BH, Lensink MF, Orengo CA, Wodak SJ. LigASite--a database of biologically relevant binding sites in proteins with known apo-structures. Nucleic Acids Res 2007; 36:D667-73. [PMID: 17933762 PMCID: PMC2238865 DOI: 10.1093/nar/gkm839] [Citation(s) in RCA: 68] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Better characterization of binding sites in proteins and the ability to accurately predict their location and energetic properties are major challenges which, if addressed, would have many valuable practical applications. Unfortunately, reliable benchmark datasets of binding sites in proteins are still sorely lacking. Here, we present LigASite ('LIGand Attachment SITE'), a gold-standard dataset of binding sites in 550 proteins of known structures. LigASite consists exclusively of biologically relevant binding sites in proteins for which at least one apo- and one holo-structure are available. In defining the binding sites for each protein, information from all holo-structures is combined, considering in each case the quaternary structure defined by the PQS server. LigASite is built using simple criteria and is automatically updated as new structures become available in the PDB, thereby guaranteeing optimal data coverage over time. Both a redundant and a culled non-redundant version of the dataset is available at http://www.scmbb.ulb.ac.be/Users/benoit/LigASite. The website interface allows users to search the dataset by PDB identifiers, ligand identifiers, protein names or sequence, and to look for structural matches as defined by the CATH homologous superfamilies. The datasets can be downloaded from the website as Schema-validated XML files or comma-separated flat files.
Collapse
|
66
|
Pu S, Vlasblom J, Emili A, Greenblatt J, Wodak SJ. Identifying functional modules in the physical interactome ofSaccharomyces cerevisiae. Proteomics 2007; 7:944-60. [PMID: 17370254 DOI: 10.1002/pmic.200600636] [Citation(s) in RCA: 106] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Reliable information on the physical and functional interactions between the gene products is an important prerequisite for deriving meaningful system-level descriptions of cellular processes. The available information about protein interactions in Saccharomyces cerevisiae has been vastly increased recently by two comprehensive tandem affinity purification/mass spectrometry (TAP/MS) studies. However, using somewhat different approaches, these studies produced diverging descriptions of the yeast interactome, clearly illustrating the fact that converting the purification data into accurate sets of protein-protein interactions and complexes remains a major challenge. Here, we review the major analytical steps involved in this process, with special focus on the task of deriving complexes from the network of binary interactions. Applying the Markov Cluster procedure to an alternative yeast interaction network, recently derived by combining the data from the two latest TAP/MS studies, we produce a new description of yeast protein complexes. Several objective criteria suggest that this new description is more accurate and meaningful than those previously published. The same criteria are also used to gauge the influence that different methods for deriving binary interactions and complexes may have on the results. Lastly, it is shown that employing identical procedures to process the latest purification datasets significantly improves the convergence between the resulting interactome descriptions.
Collapse
|
67
|
Vlasblom J, Wu S, Pu S, Superina M, Liu G, Orsi C, Wodak SJ. GenePro: a Cytoscape plug-in for advanced visualization and analysis of interaction networks. Bioinformatics 2006; 22:2178-9. [PMID: 16921162 DOI: 10.1093/bioinformatics/btl356] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Analyzing the networks of interactions between genes and proteins has become a central theme in systems biology. Versatile software tools for interactively displaying and analyzing these networks are therefore very much in demand. The public-domain open software environment Cytoscape has been developed with the goal of facilitating the design and development of such software tools by the scientific community. RESULTS We present GenePro, a plugin to Cytoscape featuring a set of versatile tools that greatly facilitates the visualization and analysis of protein networks derived from high-throughput interactions data and the validation of various methods for parsing these networks into meaningful functional modules. AVAILABILITY The GenePro plugin is available at the website http://genepro.ccb.sickkids.ca.
Collapse
|
68
|
Simonis N, Gonze D, Orsi C, van Helden J, Wodak SJ. Modularity of the transcriptional response of protein complexes in yeast. J Mol Biol 2006; 363:589-610. [PMID: 16973176 DOI: 10.1016/j.jmb.2006.06.024] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2005] [Revised: 05/14/2006] [Accepted: 06/12/2006] [Indexed: 11/24/2022]
Abstract
A comprehensive study is performed on the condition-dependent expression of genes coding for the components of hand curated multi-protein complexes of the yeast Saccharomyces cerevisiae, in order to identify coherent transcriptional modules within these complexes. Such modules are defined as groups of genes within complexes whose expression profiles under a common set of experimental conditions allow us to discriminate them from random sets of genes. Our analysis reveals that complexes such as the cytoplasmic ribosome, the proteasome and the respiration chain complexes previously characterized as "stable" or "permanent" represent transcriptional modules that are coherently up or down-regulated in many different conditions. Overall however, some level of coherent expression is detected only in 71 out of the total of 113 complexes with at least five different protein components that could be reliably analyzed. Of these, 26 behave as coherently expressed transcriptional modules encompassing all the components of the complex. In another 15, at least half of the components make up such modules and in ten, few or no modules are detected. In an additional 20 complexes coherent expression is detected, but in too few conditions to enable reliable module detection. Interestingly, the transcriptional modules, when detected, often correspond to one or more known sub-complexes with specific functions. Furthermore, detected modules are generally consistent with transcriptional modules identified on the basis of predicted cis-regulatory sequence motifs. Also, groups of genes shared between complexes that carry out related functions tend to be part of overlapping transcriptional modules identified in these complexes. Together these findings suggest that transcriptional modules may represent basic functional and evolutionary building blocs of protein complexes.
Collapse
|
69
|
Méndez R, Leplae R, Lensink MF, Wodak SJ. Assessment of CAPRI predictions in rounds 3-5 shows progress in docking procedures. Proteins 2006; 60:150-69. [PMID: 15981261 DOI: 10.1002/prot.20551] [Citation(s) in RCA: 269] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022]
Abstract
The current status of docking procedures for predicting protein-protein interactions starting from their three-dimensional (3D) structure is reassessed by evaluating blind predictions, performed during 2003-2004 as part of Rounds 3-5 of the community-wide experiment on Critical Assessment of PRedicted Interactions (CAPRI). Ten newly determined structures of protein-protein complexes were used as targets for these rounds. They comprised 2 enzyme-inhibitor complexes, 2 antigen-antibody complexes, 2 complexes involved in cellular signaling, 2 homo-oligomers, and a complex between 2 components of the bacterial cellulosome. For most targets, the predictors were given the experimental structures of 1 unbound and 1 bound component, with the latter in a random orientation. For some, the structure of the free component was derived from that of a related protein, requiring the use of homology modeling. In some of the targets, significant differences in conformation were displayed between the bound and unbound components, representing a major challenge for the docking procedures. For 1 target, predictions could not go to completion. In total, 1866 predictions submitted by 30 groups were evaluated. Over one-third of these groups applied completely novel docking algorithms and scoring functions, with several of them specifically addressing the challenge of dealing with side-chain and backbone flexibility. The quality of the predicted interactions was evaluated by comparison to the experimental structures of the targets, made available for the evaluation, using the well-agreed-upon criteria used previously. Twenty-four groups, which for the first time included an automatic Web server, produced predictions ranking from acceptable to highly accurate for all targets, including those where the structures of the bound and unbound forms differed substantially. These results and a brief survey of the methods used by participants of CAPRI Rounds 3-5 suggest that genuine progress in the performance of docking methods is being achieved, with CAPRI acting as the catalyst.
Collapse
|
70
|
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, Punna T, Peregrín-Alvarez JM, Shales M, Zhang X, Davey M, Robinson MD, Paccanaro A, Bray JE, Sheung A, Beattie B, Richards DP, Canadien V, Lalev A, Mena F, Wong P, Starostine A, Canete MM, Vlasblom J, Wu S, Orsi C, Collins SR, Chandran S, Haw R, Rilstone JJ, Gandi K, Thompson NJ, Musso G, St Onge P, Ghanny S, Lam MHY, Butland G, Altaf-Ul AM, Kanaya S, Shilatifard A, O'Shea E, Weissman JS, Ingles CJ, Hughes TR, Parkinson J, Gerstein M, Wodak SJ, Emili A, Greenblatt JF. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 2006; 440:637-43. [PMID: 16554755 DOI: 10.1038/nature04670] [Citation(s) in RCA: 2013] [Impact Index Per Article: 111.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2005] [Accepted: 02/23/2006] [Indexed: 11/09/2022]
Abstract
Identification of protein-protein interactions often provides insight into protein function, and many cellular processes are performed by stable protein complexes. We used tandem affinity purification to process 4,562 different tagged proteins of the yeast Saccharomyces cerevisiae. Each preparation was analysed by both matrix-assisted laser desorption/ionization-time of flight mass spectrometry and liquid chromatography tandem mass spectrometry to increase coverage and accuracy. Machine learning was used to integrate the mass spectrometry scores and assign probabilities to the protein-protein interactions. Among 4,087 different proteins identified with high confidence by mass spectrometry from 2,357 successful purifications, our core data set (median precision of 0.69) comprises 7,123 protein-protein interactions involving 2,708 proteins. A Markov clustering algorithm organized these interactions into 547 protein complexes averaging 4.9 subunits per complex, about half of them absent from the MIPS database, as well as 429 additional interactions between pairs of complexes. The data (all of which are available online) will help future studies on individual proteins as well as functional genomics and systems biology.
Collapse
|
71
|
Croes D, Couche F, Wodak SJ, van Helden J. Inferring meaningful pathways in weighted metabolic networks. J Mol Biol 2005; 356:222-36. [PMID: 16337962 DOI: 10.1016/j.jmb.2005.09.079] [Citation(s) in RCA: 68] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2005] [Revised: 09/06/2005] [Accepted: 09/27/2005] [Indexed: 10/25/2022]
Abstract
An approach is presented for computing meaningful pathways in the network of small molecule metabolism comprising the chemical reactions characterized in all organisms. The metabolic network is described as a weighted graph in which all the compounds are included, but each compound is assigned a weight equal to the number of reactions in which it participates. Path finding is performed in this graph by searching for one or more paths with lowest weight. Performance is evaluated systematically by computing paths between the first and last reactions in annotated metabolic pathways, and comparing the intermediate reactions in the computed pathways to those in the annotated ones. For the sake of comparison, paths are computed also in the un-weighted raw (all compounds and reactions) and filtered (highly connected pool metabolites removed) metabolic graphs, respectively. The correspondence between the computed and annotated pathways is very poor (<30%) in the raw graph; increasing to approximately 65% in the filtered graph; reaching approximately 85% in the weighted graph. Considering the best-matching path among the five lightest paths increases the correspondence to 92%, on average. We then show that the average distance between pairs of metabolites is significantly larger in the weighted graph than in the raw unfiltered graph, suggesting that the small-world properties previously reported for metabolic networks probably result from irrelevant shortcuts through pool metabolites. In addition, we provide evidence that the length of the shortest path in the weighted graph represents a valid measure of the "metabolic distance" between enzymes. We suggest that the success of our simplistic approach is rooted in the high degree of specificity of the reactions in metabolic pathways, presumably reflecting thermodynamic constraints operating in these pathways. We expect our approach to find useful applications in inferring metabolic pathways in newly sequenced genomes.
Collapse
|
72
|
Croes D, Couche F, Wodak SJ, van Helden J. Metabolic PathFinding: inferring relevant pathways in biochemical networks. Nucleic Acids Res 2005; 33:W326-30. [PMID: 15980483 PMCID: PMC1160198 DOI: 10.1093/nar/gki437] [Citation(s) in RCA: 72] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Our knowledge of metabolism can be represented as a network comprising several thousands of nodes (compounds and reactions). Several groups applied graph theory to analyse the topological properties of this network and to infer metabolic pathways by path finding. This is, however, not straightforward, with a major problem caused by traversing irrelevant shortcuts through highly connected nodes, which correspond to pool metabolites and co-factors (e.g. H2O, NADP and H+). In this study, we present a web server implementing two simple approaches, which circumvent this problem, thereby improving the relevance of the inferred pathways. In the simplest approach, the shortest path is computed, while filtering out the selection of highly connected compounds. In the second approach, the shortest path is computed on the weighted metabolic graph where each compound is assigned a weight equal to its connectivity in the network. This approach significantly increases the accuracy of the inferred pathways, enabling the correct inference of relatively long pathways (e.g. with as many as eight intermediate reactions). Available options include the calculation of the k-shortest paths between two specified seed nodes (either compounds or reactions). Multiple requests can be submitted in a queue. Results are returned by email, in textual as well as graphical formats (available in http://www.scmbb.ulb.ac.be/pathfinding/).
Collapse
|
73
|
Güldener U, Münsterkötter M, Kastenmüller G, Strack N, van Helden J, Lemer C, Richelles J, Wodak SJ, García-Martínez J, Pérez-Ortín JE, Michael H, Kaps A, Talla E, Dujon B, André B, Souciet JL, De Montigny J, Bon E, Gaillardin C, Mewes HW. CYGD: the Comprehensive Yeast Genome Database. Nucleic Acids Res 2005; 33:D364-8. [PMID: 15608217 PMCID: PMC540007 DOI: 10.1093/nar/gki053] [Citation(s) in RCA: 208] [Impact Index Per Article: 10.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
The Comprehensive Yeast Genome Database (CYGD) compiles a comprehensive data resource for information on the cellular functions of the yeast Saccharomyces cerevisiae and related species, chosen as the best understood model organism for eukaryotes. The database serves as a common resource generated by a European consortium, going beyond the provision of sequence information and functional annotations on individual genes and proteins. In addition, it provides information on the physical and functional interactions among proteins as well as other genetic elements. These cellular networks include metabolic and regulatory pathways, signal transduction and transport processes as well as co-regulated gene clusters. As more yeast genomes are published, their annotation becomes greatly facilitated using S.cerevisiae as a reference. CYGD provides a way of exploring related genomes with the aid of the S.cerevisiae genome as a backbone and SIMAP, the Similarity Matrix of Proteins. The comprehensive resource is available under http://mips.gsf.de/genre/proj/yeast/.
Collapse
|
74
|
Jaramillo A, Wodak SJ. Computational protein design is a challenge for implicit solvation models. Biophys J 2005; 88:156-71. [PMID: 15377512 PMCID: PMC1304995 DOI: 10.1529/biophysj.104.042044] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2004] [Accepted: 09/07/2004] [Indexed: 11/18/2022] Open
Abstract
Increasingly complex schemes for representing solvent effects in an implicit fashion are being used in computational analyses of biological macromolecules. These schemes speed up the calculations by orders of magnitude and are assumed to compromise little on essential features of the solvation phenomenon. In this work we examine this assumption. Five implicit solvation models, a surface area-based empirical model, two models that approximate the generalized Born treatment and a finite difference Poisson-Boltzmann method are challenged in situations differing from those where these models were calibrated. These situations are encountered in automatic protein design procedures, whose job is to select sequences, which stabilize a given protein 3D structure, from a large number of alternatives. To this end we evaluate the energetic cost of burying amino acids in thousands of environments with different solvent exposures belonging, respectively, to decoys built with random sequences and to native protein crystal structures. In addition we perform actual sequence design calculations. Except for the crudest surface area-based procedure, all the tested models tend to favor the burial of polar amino acids in the protein interior over nonpolar ones, a behavior that leads to poor performance in protein design calculations. We show, on the other hand, that three of the examined models are nonetheless capable of discriminating between the native fold and many nonnative alternatives, a test commonly used to validate force fields. It is concluded that protein design is a particularly challenging test for implicit solvation models because it requires accurate estimates of the solvation contribution of individual residues. This contrasts with native recognition, which depends less on solvation and more on other nonbonded contributions.
Collapse
|
75
|
Giraldo J, De Maria L, Wodak SJ. Shift in nucleotide conformational equilibrium contributes to increased rate of catalysis of GpAp versus GpA in barnase. Proteins 2004; 56:261-76. [PMID: 15211510 DOI: 10.1002/prot.20137] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The microbial ribonuclease barnase exhibits low catalytic activity toward GpN dinucleotides, where G is guanosine, p is phosphate and N represents any nucleoside. When a phosphate is added to the 3'-end, as in GpNp, substrate affinity is enhanced by one order of magnitude, and the catalytic rate by two. In order to gain insight into this phenomenon, we analyzed the nucleotide conformations and protein-nucleotide interactions of 4 ns molecular dynamics (MD) trajectories of complexes of barnase with guanylyl(3'-5') adenosine (GpA) and guanylyl(3'-5') adenosine 3'-monophosphate (GpAp), respectively, in the presence of solvent and counter ions. We found that, in a majority of the bound GpA conformations, the guanine base was firmly bound to the recognition site. The phosphate and adenosine moieties pointed into the solvent, and interactions with key catalytic residues were absent. In contrast, the bound GpAp adopted conformations in which all of the nucleotide portions remained tightly bound to the enzyme and interactions with key catalytic residues were maintained. These observations indicate that, for GpA, a significant proportion of the bound nucleotide adopts non-productive conformations and that adding the terminal phosphate as in GpAp shifts the equilibrium of the bound conformations towards structures capable of undergoing catalysis. Incorporating this property into the kinetic equations yields an increase in both the apparent rate constant (kcat) and the apparent dissociation constant (K(M)) for GpAp versus GpA. The increase in K(M), caused by the presence of additional non-productive binding modes for GpA, should however be counterbalanced by the propensity of free GpA to adopt folded conformations in solution, which are unable to bind the enzyme and by the tighter binding of GpAp (Giraldo J, Wodak SJ, Van Belle D. Conformational analysis of GpA and GpAp in aqueous solution by molecular dynamics and statistical methods. J Mol Biol 1998; 283:863-882). Addition of the terminal phosphate is shown to significantly influence the collective motion of the enzyme in a manner that fosters interactions with key catalytic residues, representing a further likely contribution to the catalytic rate enhancement.
Collapse
|