51
|
A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics 2007; 8:236. [PMID: 17605818 PMCID: PMC1940025 DOI: 10.1186/1471-2105-8-236] [Citation(s) in RCA: 171] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2006] [Accepted: 07/02/2007] [Indexed: 01/13/2023] Open
Abstract
BACKGROUND Identifying all protein complexes in an organism is a major goal of systems biology. In the past 18 months, the results of two genome-scale tandem affinity purification-mass spectrometry (TAP-MS) assays in yeast have been published, along with corresponding complex maps. For most complexes, the published data sets were surprisingly uncorrelated. It is therefore useful to consider the raw data from each study and generate an accurate complex map from a high-confidence data set that integrates the results of these and earlier assays. RESULTS Using an unsupervised probabilistic scoring scheme, we assigned a confidence score to each interaction in the matrix-model interpretation of the large-scale yeast mass-spectrometry data sets. The scoring metric proved more accurate than the filtering schemes used in the original data sets. We then took a high-confidence subset of these interactions and derived a set of complexes using MCL. The complexes show high correlation with existing annotations. Hierarchical organization of some protein complexes is evident from inter-complex interactions. CONCLUSION We demonstrate that our scoring method can generate an integrated high-confidence subset of observed matrix-model interactions, which we subsequently used to derive an accurate map of yeast complexes. Our results indicate that essentiality is a product of the protein complex rather than the individual protein, and that we have achieved near saturation of the yeast high-abundance, rich-media-expressed "complex-ome."
Collapse
|
52
|
Myers CL, Troyanskaya OG. Context-sensitive data integration and prediction of biological networks. ACTA ACUST UNITED AC 2007; 23:2322-30. [PMID: 17599939 DOI: 10.1093/bioinformatics/btm332] [Citation(s) in RCA: 91] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Several recent methods have addressed the problem of heterogeneous data integration and network prediction by modeling the noise inherent in high-throughput genomic datasets, which can dramatically improve specificity and sensitivity and allow the robust integration of datasets with heterogeneous properties. However, experimental technologies capture different biological processes with varying degrees of success, and thus, each source of genomic data can vary in relevance depending on the biological process one is interested in predicting. Accounting for this variation can significantly improve network prediction, but to our knowledge, no previous approaches have explicitly leveraged this critical information about biological context. RESULTS We confirm the presence of context-dependent variation in functional genomic data and propose a Bayesian approach for context-sensitive integration and query-based recovery of biological process-specific networks. By applying this method to Saccharomyces cerevisiae, we demonstrate that leveraging contextual information can significantly improve the precision of network predictions, including assignment for uncharacterized genes. We expect that this general context-sensitive approach can be applied to other organisms and prediction scenarios. AVAILABILITY A software implementation of our approach is available on request from the authors. SUPPLEMENTARY INFORMATION Supplementary data are available at http://avis.princeton.edu/contextPIXIE/
Collapse
Affiliation(s)
- Chad L Myers
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ, USA
| | | |
Collapse
|
53
|
Yu H, Kim PM, Sprecher E, Trifonov V, Gerstein M. The importance of bottlenecks in protein networks: correlation with gene essentiality and expression dynamics. PLoS Comput Biol 2007; 3:e59. [PMID: 17447836 PMCID: PMC1853125 DOI: 10.1371/journal.pcbi.0030059] [Citation(s) in RCA: 650] [Impact Index Per Article: 38.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/26/2006] [Accepted: 02/14/2007] [Indexed: 12/15/2022] Open
Abstract
It has been a long-standing goal in systems biology to find relations between the topological properties and functional features of protein networks. However, most of the focus in network studies has been on highly connected proteins ("hubs"). As a complementary notion, it is possible to define bottlenecks as proteins with a high betweenness centrality (i.e., network nodes that have many "shortest paths" going through them, analogous to major bridges and tunnels on a highway map). Bottlenecks are, in fact, key connector proteins with surprising functional and dynamic properties. In particular, they are more likely to be essential proteins. In fact, in regulatory and other directed networks, betweenness (i.e., "bottleneck-ness") is a much more significant indicator of essentiality than degree (i.e., "hub-ness"). Furthermore, bottlenecks correspond to the dynamic components of the interaction network-they are significantly less well coexpressed with their neighbors than non-bottlenecks, implying that expression dynamics is wired into the network topology.
Collapse
Affiliation(s)
- Haiyuan Yu
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Philip M Kim
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
| | - Emmett Sprecher
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
| | - Valery Trifonov
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| | - Mark Gerstein
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Computer Science, Yale University, New Haven, Connecticut, United States of America
| |
Collapse
|
54
|
Hu X, Wu DD. Data mining and predictive modeling of biomolecular network from biomedical literature databases. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2007; 4:251-63. [PMID: 17473318 DOI: 10.1109/tcbb.2007.070211] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/15/2023]
Abstract
In this paper, we present a novel approach Bio-IEDM (Biomedical Information Extraction and Data Mining) to integrate text mining and predictive modeling to analyze biomolecular network from biomedical literature databases. Our method consists of two phases. In phase 1, we discuss a semisupervised efficient learning approach to automatically extract biological relationships such as protein-protein interaction, protein-gene interaction from the biomedical literature databases to construct the biomolecular network. Our method automatically learns the patterns based on a few user seed tuples and then extracts new tuples from the biomedical literature based on the discovered patterns. The derived biomolecular network forms a large scale-free network graph. In phase 2, we present a novel clustering algorithm to analyze the biomolecular network graph to identify biologically meaningful subnetworks (communities). The clustering algorithm considers the characteristics of the scale-free network graphs and is based on the local density of the vertex and its neighborhood functions that can be used to find more meaningful clusters with different density level. The experimental results indicate our approach is very effective in extracting biological knowledge from a huge collection of biomedical literature. The integration of data mining and information extraction provides a promising direction for analyzing the biomolecular network.
Collapse
Affiliation(s)
- Xiaohua Hu
- The College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA.
| | | |
Collapse
|
55
|
Sharan R, Ulitsky I, Shamir R. Network-based prediction of protein function. Mol Syst Biol 2007; 3:88. [PMID: 17353930 PMCID: PMC1847944 DOI: 10.1038/msb4100129] [Citation(s) in RCA: 620] [Impact Index Per Article: 36.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2006] [Accepted: 01/09/2007] [Indexed: 12/22/2022] Open
Abstract
Functional annotation of proteins is a fundamental problem in the post-genomic era. The recent availability of protein interaction networks for many model species has spurred on the development of computational methods for interpreting such data in order to elucidate protein function. In this review, we describe the current computational approaches for the task, including direct methods, which propagate functional information through the network, and module-assisted methods, which infer functional modules within the network and use those for the annotation task. Although a broad variety of interesting approaches has been developed, further progress in the field will depend on systematic evaluation of the methods and their dissemination in the biological community.
Collapse
Affiliation(s)
- Roded Sharan
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Igor Ulitsky
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Ron Shamir
- School of Computer Science, Tel Aviv University, Tel Aviv, Israel
- School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel. Tel.: +972 3 6405383; Fax: +972 3 6405384;
| |
Collapse
|
56
|
Leach S, Gabow A, Hunter L, Goldberg DS. Assessing and combining reliability of protein interaction sources. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2007:433-44. [PMID: 17990508 PMCID: PMC2517251] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Integrating diverse sources of interaction information to create protein networks requires strategies sensitive to differences in accuracy and coverage of each source. Previous integration approaches calculate reliabilities of protein interaction information sources based on congruity to a designated 'gold standard.' In this paper, we provide a comparison of the two most popular existing approaches and propose a novel alternative for assessing reliabilities which does not require a gold standard. We identify a new method for combining the resultant reliabilities and compare it against an existing method. Further, we propose an extrinsic approach to evaluation of reliability estimates, considering their influence on the downstream tasks of inferring protein function and learning regulatory networks from expression data. Results using this evaluation method show 1) our method for reliability estimation is an attractive alternative to those requiring a gold standard and 2) the new method for combining reliabilities is less sensitive to noise in reliability assignments than the similar existing technique.
Collapse
Affiliation(s)
- Sonia Leach
- University of Colorado at Denver, Health Sciences Center, Aurora, CO 80045, USA.
| | | | | | | |
Collapse
|
57
|
Stuart LM, Boulais J, Charriere GM, Hennessy EJ, Brunet S, Jutras I, Goyette G, Rondeau C, Letarte S, Huang H, Ye P, Morales F, Kocks C, Bader JS, Desjardins M, Ezekowitz RAB. A systems biology analysis of the Drosophila phagosome. Nature 2006; 445:95-101. [PMID: 17151602 DOI: 10.1038/nature05380] [Citation(s) in RCA: 192] [Impact Index Per Article: 10.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2006] [Accepted: 10/24/2006] [Indexed: 11/08/2022]
Abstract
Phagocytes have a critical function in remodelling tissues during embryogenesis and thereafter are central effectors of immune defence. During phagocytosis, particles are internalized into 'phagosomes', organelles from which immune processes such as microbial destruction and antigen presentation are initiated. Certain pathogens have evolved mechanisms to evade the immune system and persist undetected within phagocytes, and it is therefore evident that a detailed knowledge of this process is essential to an understanding of many aspects of innate and adaptive immunity. However, despite the crucial role of phagosomes in immunity, their components and organization are not fully defined. Here we present a systems biology analysis of phagosomes isolated from cells derived from the genetically tractable model organism Drosophila melanogaster and address the complex dynamic interactions between proteins within this organelle and their involvement in particle engulfment. Proteomic analysis identified 617 proteins potentially associated with Drosophila phagosomes; these were organized by protein-protein interactions to generate the 'phagosome interactome', a detailed protein-protein interaction network of this subcellular compartment. These networks predicted both the architecture of the phagosome and putative biomodules. The contribution of each protein and complex to bacterial internalization was tested by RNA-mediated interference and identified known components of the phagocytic machinery. In addition, the prediction and validation of regulators of phagocytosis such as the 'exocyst', a macromolecular complex required for exocytosis but not previously implicated in phagocytosis, validates this strategy. In generating this 'systems-based model', we show the power of applying this approach to the study of complex cellular processes and organelles and expect that this detailed model of the phagosome will provide a new framework for studying host-pathogen interactions and innate immunity.
Collapse
Affiliation(s)
- L M Stuart
- Laboratory of Developmental Immunology, Massachusetts General Hospital/ Harvard Medical School, 55 Fruit Street, Boston, Massachusetts 02114, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
58
|
Sen TZ, Kloczkowski A, Jernigan RL. Functional clustering of yeast proteins from the protein-protein interaction network. BMC Bioinformatics 2006; 7:355. [PMID: 16863590 PMCID: PMC1557866 DOI: 10.1186/1471-2105-7-355] [Citation(s) in RCA: 38] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2006] [Accepted: 07/24/2006] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The abundant data available for protein interaction networks have not yet been fully understood. New types of analyses are needed to reveal organizational principles of these networks to investigate the details of functional and regulatory clusters of proteins. RESULTS In the present work, individual clusters identified by an eigenmode analysis of the connectivity matrix of the protein-protein interaction network in yeast are investigated for possible functional relationships among the members of the cluster. With our functional clustering we have successfully predicted several new protein-protein interactions that indeed have been reported recently. CONCLUSION Eigenmode analysis of the entire connectivity matrix yields both a global and a detailed view of the network. We have shown that the eigenmode clustering not only is guided by the number of proteins with which each protein interacts, but also leads to functional clustering that can be applied to predict new protein interactions.
Collapse
Affiliation(s)
- Taner Z Sen
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University Ames, IA 50011, USA
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Andrzej Kloczkowski
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| | - Robert L Jernigan
- L.H. Baker Center for Bioinformatics and Biological Statistics, Iowa State University Ames, IA 50011, USA
- Department of Biochemistry, Biophysics, and Molecular Biology, Iowa State University, Ames, IA 50011, USA
| |
Collapse
|
59
|
D'haeseleer P, Church GM. Estimating and improving protein interaction error rates. PROCEEDINGS. IEEE COMPUTATIONAL SYSTEMS BIOINFORMATICS CONFERENCE 2006:216-23. [PMID: 16448015 DOI: 10.1109/csb.2004.1332435] [Citation(s) in RCA: 26] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
High throughput protein interaction data sets have proven to be notoriously noisy. Although it is possible to focus on interactions with higher reliability by using only those that are backed up by two or more lines of evidence, this approach invariably throws out the majority of available data. A more optimal use could be achieved by incorporating the probabilities associated with all available interactions into the analysis. We present a novel method for estimating error rates associated with specific protein interaction data sets, as well as with individual interactions given the data sets in which they appear. As a bonus, we also get an estimate for the total number of protein interactions in yeast. Certain types of false positive results can be identified and removed, resulting in a significant improvement in quality of the data set. For co-purification data sets, we show how we can reach a tradeoff between the "spoke" and "matrix" representation of interactions within co-purified groups of proteins to achieve an optimal false positive error rate.
Collapse
|
60
|
Bao L, Wei L, Peirce JL, Homayouni R, Li H, Zhou M, Chen H, Lu L, Williams RW, Pfeffer LM, Goldowitz D, Cui Y. Combining gene expression QTL mapping and phenotypic spectrum analysis to uncover gene regulatory relationships. Mamm Genome 2006; 17:575-83. [PMID: 16783639 DOI: 10.1007/s00335-005-0172-2] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2005] [Accepted: 02/21/2006] [Indexed: 01/17/2023]
Abstract
Gene expression QTL (eQTL) mapping can suggest candidate regulatory relationships between genes. Recent advances in mammalian phenotype annotation such as mammalian phenotype ontology (MPO) enable systematic analysis of the phenotypic spectrum subserved by many genes. In this study we combined eQTL mapping and phenotypic spectrum analysis to predict gene regulatory relationships. Five pairs of genes with similar phenotypic effects and potential regulatory relationships suggested by eQTL mapping were identified. Lines of evidence supporting some of the predicted regulatory relationships were obtained from biological literature. A particularly notable example is that promoter sequence analysis and real-time PCR assays support the predicted regulation of protein kinase C epsilon (Prkce) by cAMP responsive element binding protein 1 (Creb1). Our results show that the combination of gene eQTL mapping and phenotypic spectrum analysis may provide a valuable approach to uncovering gene regulatory relations underlying mammalian phenotypes.
Collapse
Affiliation(s)
- Lei Bao
- Department of Molecular Sciences, University of Tennessee Health Science Center, 858 Madison Avenue, Memphis, TN 38163, USA
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
61
|
Kharchenko P, Chen L, Freund Y, Vitkup D, Church GM. Identifying metabolic enzymes with multiple types of association evidence. BMC Bioinformatics 2006; 7:177. [PMID: 16571130 PMCID: PMC1450304 DOI: 10.1186/1471-2105-7-177] [Citation(s) in RCA: 77] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2005] [Accepted: 03/29/2006] [Indexed: 02/03/2023] Open
Abstract
Background Existing large-scale metabolic models of sequenced organisms commonly include enzymatic functions which can not be attributed to any gene in that organism. Existing computational strategies for identifying such missing genes rely primarily on sequence homology to known enzyme-encoding genes. Results We present a novel method for identifying genes encoding for a specific metabolic function based on a local structure of metabolic network and multiple types of functional association evidence, including clustering of genes on the chromosome, similarity of phylogenetic profiles, gene expression, protein fusion events and others. Using E. coli and S. cerevisiae metabolic networks, we illustrate predictive ability of each individual type of association evidence and show that significantly better predictions can be obtained based on the combination of all data. In this way our method is able to predict 60% of enzyme-encoding genes of E. coli metabolism within the top 10 (out of 3551) candidates for their enzymatic function, and as a top candidate within 43% of the cases. Conclusion We illustrate that a combination of genome context and other functional association evidence is effective in predicting genes encoding metabolic enzymes. Our approach does not rely on direct sequence homology to known enzyme-encoding genes, and can be used in conjunction with traditional homology-based metabolic reconstruction methods. The method can also be used to target orphan metabolic activities.
Collapse
Affiliation(s)
- Peter Kharchenko
- Department of Genetics, New Research Building (NRB) Room 238, 77 Ave. Louis Pasteur, Harvard Medical School, Boston, MA 02115, USA
| | - Lifeng Chen
- Center for Computational Biology and Bioinformatics, Department of Biomedical Informatics, Columbia University, 1150 St. Nicholas Ave., New York, NY 10032, USA
| | - Yoav Freund
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Drive 0404, Room 4126, La Jolla, CA 92093, USA
| | - Dennis Vitkup
- Center for Computational Biology and Bioinformatics, Department of Biomedical Informatics, Columbia University, 1150 St. Nicholas Ave., New York, NY 10032, USA
| | - George M Church
- Department of Genetics, New Research Building (NRB) Room 238, 77 Ave. Louis Pasteur, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|
62
|
Koyutürk M, Kim Y, Topkara U, Subramaniam S, Szpankowski W, Grama A. Pairwise Alignment of Protein Interaction Networks. J Comput Biol 2006; 13:182-99. [PMID: 16597234 DOI: 10.1089/cmb.2006.13.182] [Citation(s) in RCA: 117] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
With an ever-increasing amount of available data on protein-protein interaction (PPI) networks and research revealing that these networks evolve at a modular level, discovery of conserved patterns in these networks becomes an important problem. Although available data on protein-protein interactions is currently limited, recently developed algorithms have been shown to convey novel biological insights through employment of elegant mathematical models. The main challenge in aligning PPI networks is to define a graph theoretical measure of similarity between graph structures that captures underlying biological phenomena accurately. In this respect, modeling of conservation and divergence of interactions, as well as the interpretation of resulting alignments, are important design parameters. In this paper, we develop a framework for comprehensive alignment of PPI networks, which is inspired by duplication/divergence models that focus on understanding the evolution of protein interactions. We propose a mathematical model that extends the concepts of match, mismatch, and gap in sequence alignment to that of match, mismatch, and duplication in network alignment and evaluates similarity between graph structures through a scoring function that accounts for evolutionary events. By relying on evolutionary models, the proposed framework facilitates interpretation of resulting alignments in terms of not only conservation but also divergence of modularity in PPI networks. Furthermore, as in the case of sequence alignment, our model allows flexibility in adjusting parameters to quantify underlying evolutionary relationships. Based on the proposed model, we formulate PPI network alignment as an optimization problem and present fast algorithms to solve this problem. Detailed experimental results from an implementation of the proposed framework show that our algorithm is able to discover conserved interaction patterns very effectively, in terms of both accuracies and computational cost.
Collapse
Affiliation(s)
- Mehmet Koyutürk
- Department of Computer Sciences, Purdue University, West Lafayette, IN 47907, USA.
| | | | | | | | | | | |
Collapse
|
63
|
Sevon P, Eronen L, Hintsanen P, Kulovesi K, Toivonen H. Link Discovery in Graphs Derived from Biological Databases. LECTURE NOTES IN COMPUTER SCIENCE 2006. [DOI: 10.1007/11799511_5] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
|
64
|
Myers CL, Robson D, Wible A, Hibbs MA, Chiriac C, Theesfeld CL, Dolinski K, Troyanskaya OG. Discovery of biological networks from diverse functional genomic data. Genome Biol 2005; 6:R114. [PMID: 16420673 PMCID: PMC1414113 DOI: 10.1186/gb-2005-6-13-r114] [Citation(s) in RCA: 170] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2005] [Revised: 08/31/2005] [Accepted: 11/21/2005] [Indexed: 01/31/2023] Open
Abstract
BioPIXIE is a probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. We have developed a general probabilistic system for query-based discovery of pathway-specific networks through integration of diverse genome-wide data. This framework was validated by accurately recovering known networks for 31 biological processes in Saccharomyces cerevisiae and experimentally verifying predictions for the process of chromosomal segregation. Our system, bioPIXIE, a public, comprehensive system for integration, analysis, and visualization of biological network predictions for S. cerevisiae, is freely accessible over the worldwide web.
Collapse
Affiliation(s)
- Chad L Myers
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Drew Robson
- Department of Mathematics, Princeton University, Washington Road, Princeton, NJ 08540, USA
| | - Adam Wible
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
| | - Matthew A Hibbs
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Camelia Chiriac
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Chandra L Theesfeld
- Department of Genetics, School of Medicine, Mailstop-S120, Stanford University, Stanford, CA 94305-5120, USA
| | - Kara Dolinski
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| | - Olga G Troyanskaya
- Department of Computer Science, Princeton University, 35 Olden Street, Princeton, NJ 08544, USA
- Lewis-Sigler Institute for Integrative Genomics, Carl Icahn Laboratory, Princeton University, Princeton, NJ 08544, USA
| |
Collapse
|
65
|
Abstract
The two-hybrid method detects the interaction of two proteins by their ability to reconstitute the activity of a split transcription factor, thus allowing the use of a simple growth selection in yeast to identify new interactions. Since its introduction about 15 years ago, the assay largely has been applied to single proteins, successfully uncovering thousands of novel protein partners. In the last few years, however, two-hybrid experiments have been scaled up to focus on the entire complement of proteins found in an organism. Although a single such effort can itself result in thousands of interactions, the validity of these high-throughput approaches has been questioned as a result of the prevalence of numerous false positives in these large data sets. Such artifacts may not be an obstacle to continued scale-up of the method, because the classification of true and false positives has proven to be a computational challenge that can be met by a growing number of creative strategies. Two examples are provided of this combination of high-throughput experimentation and computational analysis, focused on the interaction of Plasmodium falciparum proteins and of Saccharomyces cerevisiae membrane proteins.
Collapse
Affiliation(s)
- Stanley Fields
- Howard Hughes Medical Institute, Department of Genome Sciences, University of Washington, Seattle, WA 98195, USA.
| |
Collapse
|
66
|
Patil A, Nakamura H. Filtering high-throughput protein-protein interaction data using a combination of genomic features. BMC Bioinformatics 2005; 6:100. [PMID: 15833142 PMCID: PMC1127019 DOI: 10.1186/1471-2105-6-100] [Citation(s) in RCA: 103] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2004] [Accepted: 04/18/2005] [Indexed: 11/30/2022] Open
Abstract
Background Protein-protein interaction data used in the creation or prediction of molecular networks is usually obtained from large scale or high-throughput experiments. This experimental data is liable to contain a large number of spurious interactions. Hence, there is a need to validate the interactions and filter out the incorrect data before using them in prediction studies. Results In this study, we use a combination of 3 genomic features – structurally known interacting Pfam domains, Gene Ontology annotations and sequence homology – as a means to assign reliability to the protein-protein interactions in Saccharomyces cerevisiae determined by high-throughput experiments. Using Bayesian network approaches, we show that protein-protein interactions from high-throughput data supported by one or more genomic features have a higher likelihood ratio and hence are more likely to be real interactions. Our method has a high sensitivity (90%) and good specificity (63%). We show that 56% of the interactions from high-throughput experiments in Saccharomyces cerevisiae have high reliability. We use the method to estimate the number of true interactions in the high-throughput protein-protein interaction data sets in Caenorhabditis elegans, Drosophila melanogaster and Homo sapiens to be 27%, 18% and 68% respectively. Our results are available for searching and downloading at . Conclusion A combination of genomic features that include sequence, structure and annotation information is a good predictor of true interactions in large and noisy high-throughput data sets. The method has a very high sensitivity and good specificity and can be used to assign a likelihood ratio, corresponding to the reliability, to each interaction.
Collapse
Affiliation(s)
- Ashwini Patil
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan
- Department of Biology, Graduate School of Science, Osaka University, 1-1 Machikaneyama-cho, Toyonaka, Osaka 560-0043, Japan
| | - Haruki Nakamura
- Institute for Protein Research, Osaka University, 3-2 Yamadaoka, Suita, Osaka 565-0871, Japan
| |
Collapse
|
67
|
Pairwise Local Alignment of Protein Interaction Networks Guided by Models of Evolution. LECTURE NOTES IN COMPUTER SCIENCE 2005. [DOI: 10.1007/11415770_4] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/04/2022]
|
68
|
Chen Y, Xu D. Global protein function annotation through mining genome-scale data in yeast Saccharomyces cerevisiae. Nucleic Acids Res 2004; 32:6414-24. [PMID: 15585665 PMCID: PMC535686 DOI: 10.1093/nar/gkh978] [Citation(s) in RCA: 94] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2004] [Revised: 11/15/2004] [Accepted: 11/15/2004] [Indexed: 11/14/2022] Open
Abstract
As we are moving into the post genome-sequencing era, various high-throughput experimental techniques have been developed to characterize biological systems on the genomic scale. Discovering new biological knowledge from the high-throughput biological data is a major challenge to bioinformatics today. To address this challenge, we developed a Bayesian statistical method together with Boltzmann machine and simulated annealing for protein functional annotation in the yeast Saccharomyces cerevisiae through integrating various high-throughput biological data, including yeast two-hybrid data, protein complexes and microarray gene expression profiles. In our approach, we quantified the relationship between functional similarity and high-throughput data, and coded the relationship into 'functional linkage graph', where each node represents one protein and the weight of each edge is characterized by the Bayesian probability of function similarity between two proteins. We also integrated the evolution information and protein subcellular localization information into the prediction. Based on our method, 1802 out of 2280 unannotated proteins in yeast were assigned functions systematically.
Collapse
Affiliation(s)
- Yu Chen
- UT-ORNL Graduate School of Genome Science and Technology, Oak Ridge, TN, USA
| | | |
Collapse
|
69
|
Kemmeren P, Kockelkorn TTJP, Bijma T, Donders R, Holstege FCP. Predicting gene function through systematic analysis and quality assessment of high-throughput data. Bioinformatics 2004; 21:1644-52. [PMID: 15531615 DOI: 10.1093/bioinformatics/bti103] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
MOTIVATION Determining gene function is an important challenge arising from the availability of whole genome sequences. Until recently, approaches based on sequence homology were the only high-throughput method for predicting gene function. Use of high-throughput generated experimental data sets for determining gene function has been limited for several reasons. RESULTS Here a new approach is presented for integration of high-throughput data sets, leading to prediction of function based on relationships supported by multiple types and sources of data. This is achieved with a database containing 125 different high-throughput data sets describing phenotypes, cellular localizations, protein interactions and mRNA expression levels from Saccharomyces cerevisiae, using a bit-vector representation and information content-based ranking. The approach takes characteristic and qualitative differences between the data sets into account, is highly flexible, efficient and scalable. Database queries result in predictions for 543 uncharacterized genes, based on multiple functional relationships each supported by at least three types of experimental data. Some of these are experimentally verified, further demonstrating their reliability. The results also generate insights into the relative merits of different data types and provide a coherent framework for functional genomic datamining. AVAILABILITY Free availability over the Internet. CONTACT f.c.p.holstege@med.uu.nl SUPPLEMENTARY INFORMATION http://www.genomics.med.uu.nl/pub/pk/comb_gen_network.
Collapse
Affiliation(s)
- Patrick Kemmeren
- Department of Physiological Chemistry, University Medical Center Utrecht, PO Box 85060, 3508 AB Utrecht, The Netherlands
| | | | | | | | | |
Collapse
|
70
|
Wodak SJ, Castura J, Orsi C. Integrative bioinformatics: making sense of the networks. DRUG DISCOVERY TODAY. TECHNOLOGIES 2004; 1:179-187. [PMID: 24981389 DOI: 10.1016/j.ddtec.2004.10.001] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
The focus of biology has shifted from the investigation of individual genes and proteins to the study of large complex networks featuring interactions between tens of thousands of molecular and cellular components. Information on these networks is obtained from genome-scale experimental and theoretical analyses, which yield valuable but noisy data, on biological processes that are still poorly understood. The new exciting developments in bioinformatics show great promise in meeting the challenge of extracting biological insight from these data.:
Collapse
Affiliation(s)
- Shoshana J Wodak
- Biochemistry Department, University of Toronto, Medical Sciences Building, 1 King's College Circle, Toronto, Ont., Canada M5S 1A8.
| | - Jeffrey Castura
- School of Information Technology and Engineering, University of Ottawa, 800 King Edward Avenue, P.O. Box 450, Stn. A, Ottawa, Ont., Canada K1N 6N5. http://www.site.uottawa.ca
| | - Chris Orsi
- Department of Structural Biology and Biochemistry, Hospital for Sick Children, 555 University Avenue, Toronto, Ont., Canada M5G 1X8
| |
Collapse
|
71
|
Zhang LV, Wong SL, King OD, Roth FP. Predicting co-complexed protein pairs using genomic and proteomic data integration. BMC Bioinformatics 2004; 5:38. [PMID: 15090078 PMCID: PMC419405 DOI: 10.1186/1471-2105-5-38] [Citation(s) in RCA: 93] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2003] [Accepted: 04/16/2004] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identifying all protein-protein interactions in an organism is a major objective of proteomics. A related goal is to know which protein pairs are present in the same protein complex. High-throughput methods such as yeast two-hybrid (Y2H) and affinity purification coupled with mass spectrometry (APMS) have been used to detect interacting proteins on a genomic scale. However, both Y2H and APMS methods have substantial false-positive rates. Aside from high-throughput interaction screens, other gene- or protein-pair characteristics may also be informative of physical interaction. Therefore it is desirable to integrate multiple datasets and utilize their different predictive value for more accurate prediction of co-complexed relationship. RESULTS Using a supervised machine learning approach--probabilistic decision tree, we integrated high-throughput protein interaction datasets and other gene- and protein-pair characteristics to predict co-complexed pairs (CCP) of proteins. Our predictions proved more sensitive and specific than predictions based on Y2H or APMS methods alone or in combination. Among the top predictions not annotated as CCPs in our reference set (obtained from the MIPS complex catalogue), a significant fraction was found to physically interact according to a separate database (YPD, Yeast Proteome Database), and the remaining predictions may potentially represent unknown CCPs. CONCLUSIONS We demonstrated that the probabilistic decision tree approach can be successfully used to predict co-complexed protein (CCP) pairs from other characteristics. Our top-scoring CCP predictions provide testable hypotheses for experimental validation.
Collapse
Affiliation(s)
- Lan V Zhang
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Sharyl L Wong
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Oliver D King
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| | - Frederick P Roth
- Department of Biological Chemistry and Molecular Pharmacology, Harvard Medical School, Boston, MA 02115, USA
| |
Collapse
|