1
|
Munikoti S, Agarwal D, Das L, Natarajan B. A General Framework for quantifying Aleatoric and Epistemic uncertainty in Graph Neural Networks. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2022.11.049] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
2
|
Stable structural clustering in uncertain graphs. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.11.078] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
3
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
4
|
Redhu N, Thakur Z. Network biology and applications. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00024-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022] Open
|
5
|
Defining and measuring probabilistic ego networks. SOCIAL NETWORK ANALYSIS AND MINING 2021. [DOI: 10.1007/s13278-020-00708-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/22/2022]
Abstract
AbstractAnalyzing ego networks to investigate local properties and behaviors of individuals is a fundamental task in social network research. In this paper we show that there is not a unique way of defining ego networks when the existence of edges is uncertain, since there are two different ways of defining the neighborhood of a node in such network models. Therefore, we introduce two definitions of probabilistic ego networks, called V-Alters-Ego and F-Alters-Ego, both rooted in the literature. Following that, we investigate three fundamental measures (degree, betweenness and closeness) for each definition. We also propose a method to approximate betweenness of an ego node among the neighbors which are connected via shortest paths with length 2. We show that this approximation method is faster to compute and it has high correlation with ego betweenness under the V-Alters-Ego definition in many datasets. Therefore, it can be a reasonable alternative to represent the extent to which a node plays the role of an intermediate node among its neighbors.
Collapse
|
6
|
Reliable Route Selection for Wireless Sensor Networks with Connection Failure Uncertainties. SENSORS 2021; 21:s21217254. [PMID: 34770561 PMCID: PMC8588549 DOI: 10.3390/s21217254] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/03/2021] [Revised: 10/27/2021] [Accepted: 10/29/2021] [Indexed: 11/17/2022]
Abstract
For wireless sensor networks (WSN) with connection failure uncertainties, traditional minimum spanning trees are no longer a feasible option for selecting routes. Reliability should come first before cost since no one wants a network that cannot work most of the time. First, reliable route selection for WSNs with connection failure uncertainties is formulated by considering the top-k most reliable spanning trees (RST) from graphs with structural uncertainties. The reliable spanning trees are defined as a set of spanning trees with top reliabilities and limited tree weights based on the possible world model. Second, two tree-filtering algorithms are proposed: the k minimum spanning tree (KMST) based tree-filtering algorithm and the depth-first search (DFS) based tree-filtering algorithm. Tree-filtering strategy filters the candidate RSTs generated by tree enumeration with explicit weight thresholds and implicit reliability thresholds. Third, an innovative edge-filtering method is presented in which edge combinations that act as upper bounds for RST reliabilities are utilized to filter the RST candidates and to prune search spaces. Optimization strategies are also proposed for improving pruning capabilities further and for enhancing computations. Extensive experiments are conducted to show the effectiveness and efficiency of the proposed algorithms.
Collapse
|
7
|
Li F. An efficient mining algorithm for maximal frequent patterns in uncertain graph database. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS 2020. [DOI: 10.3233/jifs-200237] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022]
Abstract
Mining maximal frequent patterns is significant in many fields, but the mining efficiency is often low. The bottleneck lies in too many candidate subgraphs and extensive subgraph isomorphism tests. In this paper we propose an efficient mining algorithm. There are two key ideas behind the proposed methods. The first is to divide each edge of every certain graph (converted from equivalent uncertain graph) and build search tree, avoiding too many candidate subgraphs. The second is to search the tree built in the first step in order, avoiding extensive subgraph isomorphism tests. The evaluation of our approach demonstrates the significant cost savings with respect to the state-of-the-art approach not only on the real-world datasets as well as on synthetic uncertain graph databases.
Collapse
Affiliation(s)
- Feng Li
- School of Computer and Communication, Hunan Institute of Engineering, Hunan, China
| |
Collapse
|
8
|
Abstract
We present a detailed survey of results and two new results on graphical models of uncertainty and associated optimization problems. We focus on two well-studied models, namely, the Random Failure (RF) model and the Linear Reliability Ordering (LRO) model. We present an FPT algorithm parameterized by the product of treewidth and max-degree for maximizing expected coverage in an uncertain graph under the RF model. We then consider the problem of finding the maximal core in a graph, which is known to be polynomial time solvable. We show that the Probabilistic-Core problem is polynomial time solvable in uncertain graphs under the LRO model. On the other hand, under the RF model, we show that the Probabilistic-Core problem is W[1]-hard for the parameter d, where d is the minimum degree of the core. We then design an FPT algorithm for the parameter treewidth.
Collapse
|
9
|
Peng X, Wang J, Peng W, Wu FX, Pan Y. Protein-protein interactions: detection, reliability assessment and applications. Brief Bioinform 2017; 18:798-819. [PMID: 27444371 DOI: 10.1093/bib/bbw066] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Indexed: 01/06/2023] Open
Abstract
Protein-protein interactions (PPIs) participate in all important biological processes in living organisms, such as catalyzing metabolic reactions, DNA replication, DNA transcription, responding to stimuli and transporting molecules from one location to another. To reveal the function mechanisms in cells, it is important to identify PPIs that take place in the living organism. A large number of PPIs have been discovered by high-throughput experiments and computational methods. However, false-positive PPIs have been introduced too. Therefore, to obtain reliable PPIs, many computational methods have been proposed. Generally, these methods can be classified into two categories. One category includes the methods that are designed to determine new reliable PPIs. The other one is designed to assess the reliability of existing PPIs and filter out the unreliable ones. In this article, we review the two kinds of methods for detecting reliable PPIs, and then focus on evaluating the performance of some of these typical methods. Later on, we also enumerate several PPI network-based applications with taking a reliability assessment of the PPI data into consideration. Finally, we will discuss the challenges for obtaining reliable PPIs and future directions of the construction of reliable PPI networks. Our research will provide readers some guidance for choosing appropriate methods and features for obtaining reliable PPIs.
Collapse
|
10
|
Meysman P, Titeca K, Eyckerman S, Tavernier J, Goethals B, Martens L, Valkenborg D, Laukens K. Protein complex analysis: From raw protein lists to protein interaction networks. MASS SPECTROMETRY REVIEWS 2017; 36:600-614. [PMID: 26709718 DOI: 10.1002/mas.21485] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/04/2015] [Accepted: 11/17/2015] [Indexed: 06/05/2023]
Abstract
The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.
Collapse
Affiliation(s)
- Pieter Meysman
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| | - Kevin Titeca
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Sven Eyckerman
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Jan Tavernier
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Bart Goethals
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
| | - Lennart Martens
- Department of Medical Protein Research, VIB, B-9000 Ghent, Belgium
- Department of Biochemistry, Ghent University, B-9000 Ghent, Belgium
| | - Dirk Valkenborg
- Flemish Institute for Technological Research (VITO), Mol, Belgium
- IBioStat, Hasselt University, Hasselt, Belgium
- CFP-CeProMa, University of Antwerp, Antwerp, Belgium
| | - Kris Laukens
- Advanced Database Research and Modelling (ADReM), Department of Mathematics and Computer Science, University of Antwerp, Antwerp, Belgium
- Biomedical Informatics Research Center Antwerp (biomina), University of Antwerp/Antwerp University Hospital, Edegem, Belgium
| |
Collapse
|
11
|
|
12
|
Abstract
Molecular profiling of proteins and phosphoproteins using a reverse phase protein array (RPPA) platform, with a panel of target-specific antibodies, enables the parallel, quantitative proteomic analysis of many biological samples in a microarray format. Hence, RPPA analysis can generate a high volume of multidimensional data that must be effectively interrogated and interpreted. A range of computational techniques for data mining can be applied to detect and explore data structure and to form functional predictions from large datasets. Here, two approaches for the computational analysis of RPPA data are detailed: the identification of similar patterns of protein expression by hierarchical cluster analysis and the modeling of protein interactions and signaling relationships by network analysis. The protocols use freely available, cross-platform software, are easy to implement, and do not require any programming expertise. Serving as data-driven starting points for further in-depth analysis, validation, and biological experimentation, these and related bioinformatic approaches can accelerate the functional interpretation of RPPA data.
Collapse
Affiliation(s)
- Adam Byron
- Cancer Research UK Edinburgh Centre, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, EH4 2XR, UK.
| |
Collapse
|
13
|
Ramadan E, Naef A, Ahmed M. Protein complexes predictions within protein interaction networks using genetic algorithms. BMC Bioinformatics 2016; 17 Suppl 7:269. [PMID: 27454228 PMCID: PMC4965715 DOI: 10.1186/s12859-016-1096-4] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Background Protein–protein interaction networks are receiving increased attention due to their importance in understanding life at the cellular level. A major challenge in systems biology is to understand the modular structure of such biological networks. Although clustering techniques have been proposed for clustering protein–protein interaction networks, those techniques suffer from some drawbacks. The application of earlier clustering techniques to protein–protein interaction networks in order to predict protein complexes within the networks does not yield good results due to the small-world and power-law properties of these networks. Results In this paper, we construct a new clustering algorithm for predicting protein complexes through the use of genetic algorithms. We design an objective function for exclusive clustering and overlapping clustering. We assess the quality of our proposed clustering algorithm using two gold-standard data sets. Conclusions Our algorithm can identify protein complexes that are significantly enriched in the gold-standard data sets. Furthermore, our method surpasses three competing methods: MCL, ClusterOne, and MCODE in terms of the quality of the predicted complexes. The source code and accompanying examples are freely available at http://faculty.kfupm.edu.sa/ics/eramadan/GACluster.zip.
Collapse
Affiliation(s)
- Emad Ramadan
- Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia.
| | - Ahmed Naef
- Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
| | - Moataz Ahmed
- Department of Information and Computer Science, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia
| |
Collapse
|
14
|
Ahmed NM, Chen L. An efficient algorithm for link prediction in temporal uncertain social networks. Inf Sci (N Y) 2016. [DOI: 10.1016/j.ins.2015.10.036] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
|
15
|
Akerman M, Fregoso OI, Das S, Ruse C, Jensen MA, Pappin DJ, Zhang MQ, Krainer AR. Differential connectivity of splicing activators and repressors to the human spliceosome. Genome Biol 2015; 16:119. [PMID: 26047612 PMCID: PMC4502471 DOI: 10.1186/s13059-015-0682-5] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 05/22/2015] [Indexed: 12/29/2022] Open
Abstract
Background During spliceosome assembly, protein-protein interactions (PPI) are sequentially formed and disrupted to accommodate the spatial requirements of pre-mRNA substrate recognition and catalysis. Splicing activators and repressors, such as SR proteins and hnRNPs, modulate spliceosome assembly and regulate alternative splicing. However, it remains unclear how they differentially interact with the core spliceosome to perform their functions. Results Here, we investigate the protein connectivity of SR and hnRNP proteins to the core spliceosome using probabilistic network reconstruction based on the integration of interactome and gene expression data. We validate our model by immunoprecipitation and mass spectrometry of the prototypical splicing factors SRSF1 and hnRNPA1. Network analysis reveals that a factor’s properties as an activator or repressor can be predicted from its overall connectivity to the rest of the spliceosome. In addition, we discover and experimentally validate PPIs between the oncoprotein SRSF1 and members of the anti-tumor drug target SF3 complex. Our findings suggest that activators promote the formation of PPIs between spliceosomal sub-complexes, whereas repressors mostly operate through protein-RNA interactions. Conclusions This study demonstrates that combining in-silico modeling with biochemistry can significantly advance the understanding of structure and function relationships in the human spliceosome. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0682-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Martin Akerman
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Present address: Envisagenics, Inc, 315 Main St., 2nd floor, Huntington, NY, 11743, USA
| | - Oliver I Fregoso
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Watson School of Biological Sciences, Cold Spring Harbor, NY, 11724, USA.,Present address: Fred Hutchinson Cancer Research Center, Division of Human Biology, 1100 Fairview Ave N, Seattle, WA, 98109, USA
| | - Shipra Das
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA
| | - Cristian Ruse
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Present address: New England Biolabs, 240 County Road, Ipswich, MA, 01938, UK
| | - Mads A Jensen
- Cold Spring Harbor Laboratory, Cold Spring Harbor, NY, USA.,Present address: Santaris Pharma A/S, Horsholm, Denmark
| | | | - Michael Q Zhang
- Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas at Dallas, Richardson, TX, 75080, USA.,Bioinformatics Division, TNLIST, Tsinghua University, Beijing, 100084, China
| | | |
Collapse
|
16
|
|
17
|
|
18
|
Saha S, Chatterjee P, Basu S, Kundu M, Nasipuri M. FunPred-1: protein function prediction from a protein interaction network using neighborhood analysis. Cell Mol Biol Lett 2014; 19:675-91. [PMID: 25424913 PMCID: PMC6275854 DOI: 10.2478/s11658-014-0221-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2014] [Accepted: 11/20/2014] [Indexed: 01/05/2023] Open
Abstract
Proteins are responsible for all biological activities in living organisms. Thanks to genome sequencing projects, large amounts of DNA and protein sequence data are now available, but the biological functions of many proteins are still not annotated in most cases. The unknown function of such non-annotated proteins may be inferred or deduced from their neighbors in a protein interaction network. In this paper, we propose two new methods to predict protein functions based on network neighborhood properties. FunPred 1.1 uses a combination of three simple-yet-effective scoring techniques: the neighborhood ratio, the protein path connectivity and the relative functional similarity. FunPred 1.2 applies a heuristic approach using the edge clustering coefficient to reduce the search space by identifying densely connected neighborhood regions. The overall accuracy achieved in FunPred 1.2 over 8 functional groups involving hetero-interactions in 650 yeast proteins is around 87%, which is higher than the accuracy with FunPred 1.1. It is also higher than the accuracy of many of the state-of-the-art protein function prediction methods described in the literature. The test datasets and the complete source code of the developed software are now freely available at http://code.google.com/p/cmaterbioinfo/ .
Collapse
Affiliation(s)
- Sovan Saha
- Department of Computer Science and Engineering, Dr. Sudhir Chandra Sur Degree Engineering College, Dumdum, Kolkata 700074 India
| | - Piyali Chatterjee
- Department of Computer Science and Engineering, Netaji Subhash Engineering College, Garia, Kolkata 700152 India
| | - Subhadip Basu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032 India
| | - Mahantapas Kundu
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032 India
| | - Mita Nasipuri
- Department of Computer Science and Engineering, Jadavpur University, Kolkata, 700032 India
| |
Collapse
|
19
|
Rhee SY, Mutwil M. Towards revealing the functions of all genes in plants. TRENDS IN PLANT SCIENCE 2014; 19:212-21. [PMID: 24231067 DOI: 10.1016/j.tplants.2013.10.006] [Citation(s) in RCA: 146] [Impact Index Per Article: 14.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/06/2013] [Revised: 10/10/2013] [Accepted: 10/16/2013] [Indexed: 05/19/2023]
Abstract
The great recent progress made in identifying the molecular parts lists of organisms revealed the paucity of our understanding of what most of the parts do. In this review, we introduce computational and statistical approaches and omics data used for inferring gene function in plants, with an emphasis on network-based inference. We also discuss caveats associated with network-based function predictions such as performance assessment, annotation propagation, the guilt-by-association concept, and the meaning of hubs. Finally, we note the current limitations and possible future directions such as the need for gold standard data from several species, unified access to data and tools, quantitative comparison of data and tool quality, and high-throughput experimental validation platforms for systematic gene function elucidation in plants.
Collapse
Affiliation(s)
- Seung Yon Rhee
- Carnegie Institution for Science, Department of Plant Biology, 260 Panama St, Stanford, CA 94305, USA.
| | - Marek Mutwil
- Max Planck Institute for Molecular Plant Physiology, 14476 Potsdam, Germany.
| |
Collapse
|
20
|
Protein-protein interaction detection: methods and analysis. INTERNATIONAL JOURNAL OF PROTEOMICS 2014; 2014:147648. [PMID: 24693427 PMCID: PMC3947875 DOI: 10.1155/2014/147648] [Citation(s) in RCA: 375] [Impact Index Per Article: 37.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Received: 10/26/2013] [Revised: 12/05/2013] [Accepted: 12/20/2013] [Indexed: 12/24/2022]
Abstract
Protein-protein interaction plays key role in predicting the protein function of target protein and drug ability of molecules. The majority of genes and proteins realize resulting phenotype functions as a set of interactions. The in vitro and in vivo methods like affinity purification, Y2H (yeast 2 hybrid), TAP (tandem affinity purification), and so forth have their own limitations like cost, time, and so forth, and the resultant data sets are noisy and have more false positives to annotate the function of drug molecules. Thus, in silico methods which include sequence-based approaches, structure-based approaches, chromosome proximity, gene fusion, in silico 2 hybrid, phylogenetic tree, phylogenetic profile, and gene expression-based approaches were developed. Elucidation of protein interaction networks also contributes greatly to the analysis of signal transduction pathways. Recent developments have also led to the construction of networks having all the protein-protein interactions using computational methods for signaling pathways and protein complex identification in specific diseases.
Collapse
|
21
|
Detecting protein complexes based on relevancy from protein interaction networks. Interdiscip Sci 2013; 5:167-74. [PMID: 24307408 DOI: 10.1007/s12539-013-0171-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Revised: 03/30/2013] [Accepted: 06/12/2013] [Indexed: 10/26/2022]
Abstract
In protein-protein interaction networks, proteins combine into macromolecular complexes to execute essential functions in the cells, such as replication, transcription, protein transport. To solve the problem of detecting protein complexes from protein interaction networks, we used relevant graph and irrelevant graph to represent the relation of connection between a node and a core graph. We defined a variable Relevancy to represent whether a node had a dense or loose connection to a core graph. Then we proposed the Relevancy Judgment algorithm to detecting protein complexes from protein interaction networks. Our algorithm decided whether a node belonged to a protein complex through judging the relevancy between core graph and nodes out of core graph. Experiment results show that our algorithm has an excellent performance in both accuracy and hit rate.
Collapse
|
22
|
Lei C, Tamim S, Bishop AJ, Ruan J. Fully automated protein complex prediction based on topological similarity and community structure. Proteome Sci 2013; 11:S9. [PMID: 24564887 PMCID: PMC3908383 DOI: 10.1186/1477-5956-11-s1-s9] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
To understand the function of protein complexes and their association with biological processes, a lot of studies have been done towards analyzing the protein-protein interaction (PPI) networks. However, the advancement in high-throughput technology has resulted in a humongous amount of data for analysis. Moreover, high level of noise, sparseness, and skewness in degree distribution of PPI networks limits the performance of many clustering algorithms and further analysis of their interactions. In addressing and solving these problems we present a novel random walk based algorithm that converts the incomplete and binary PPI network into a protein-protein topological similarity matrix (PP-TS matrix). We believe that if two proteins share some high-order topological similarities they are likely to be interacting with each other. Using the obtained PP-TS matrix, we constructed and used weighted networks to further study and analyze the interaction among proteins. Specifically, we applied a fully automated community structure finding algorithm (Auto-HQcut) on the obtained weighted network to cluster protein complexes. We then analyzed the protein complexes for significance in biological processes. To help visualize and analyze these protein complexes we also developed an interface that displays the resulting complexes as well as the characteristics associated with each complex. Applying our approach to a yeast protein-protein interaction network, we found that the predicted protein-protein interaction pairs with high topological similarities have more significant biological relevance than the original protein-protein interactions pairs. When we compared our PPI network reconstruction algorithm with other existing algorithms using gene ontology and gene co-expression, our algorithm produced the highest similarity scores. Also, our predicted protein complexes showed higher accuracy measure compared to the other protein complex predictions.
Collapse
|
23
|
Hu P, Jiang H, Emili A. Incorporating Correlations among Gene Ontology Terms into Predicting Protein Functions. Bioinformatics 2013. [DOI: 10.4018/978-1-4666-3604-0.ch045] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022] Open
Abstract
The authors describe a new strategy that has better prediction performance than previous methods, which gives additional insights about the importance of the dependence between functional terms when inferring protein function.
Collapse
Affiliation(s)
- Pingzhao Hu
- York University, Canada & University of Toronto, Canada
| | | | | |
Collapse
|
24
|
Abstract
Complex diseases are caused by a combination of genetic and environmental factors. Uncovering the molecular pathways through which genetic factors affect a phenotype is always difficult, but in the case of complex diseases this is further complicated since genetic factors in affected individuals might be different. In recent years, systems biology approaches and, more specifically, network based approaches emerged as powerful tools for studying complex diseases. These approaches are often built on the knowledge of physical or functional interactions between molecules which are usually represented as an interaction network. An interaction network not only reports the binary relationships between individual nodes but also encodes hidden higher level organization of cellular communication. Computational biologists were challenged with the task of uncovering this organization and utilizing it for the understanding of disease complexity, which prompted rich and diverse algorithmic approaches to be proposed. We start this chapter with a description of the general characteristics of complex diseases followed by a brief introduction to physical and functional networks. Next we will show how these networks are used to leverage genotype, gene expression, and other types of data to identify dysregulated pathways, infer the relationships between genotype and phenotype, and explain disease heterogeneity. We group the methods by common underlying principles and first provide a high level description of the principles followed by more specific examples. We hope that this chapter will give readers an appreciation for the wealth of algorithmic techniques that have been developed for the purpose of studying complex diseases as well as insight into their strengths and limitations.
Collapse
Affiliation(s)
- Dong-Yeon Cho
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Yoo-Ah Kim
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
25
|
Lei C, Ruan J. A novel link prediction algorithm for reconstructing protein-protein interaction networks by topological similarity. ACTA ACUST UNITED AC 2012; 29:355-64. [PMID: 23235927 DOI: 10.1093/bioinformatics/bts688] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
Abstract
MOTIVATION Recent advances in technology have dramatically increased the availability of protein-protein interaction (PPI) data and stimulated the development of many methods for improving the systems level understanding the cell. However, those efforts have been significantly hindered by the high level of noise, sparseness and highly skewed degree distribution of PPI networks. Here, we present a novel algorithm to reduce the noise present in PPI networks. The key idea of our algorithm is that two proteins sharing some higher-order topological similarities, measured by a novel random walk-based procedure, are likely interacting with each other and may belong to the same protein complex. RESULTS Applying our algorithm to a yeast PPI network, we found that the edges in the reconstructed network have higher biological relevance than in the original network, assessed by multiple types of information, including gene ontology, gene expression, essentiality, conservation between species and known protein complexes. Comparison with existing methods shows that the network reconstructed by our method has the highest quality. Using two independent graph clustering algorithms, we found that the reconstructed network has resulted in significantly improved prediction accuracy of protein complexes. Furthermore, our method is applicable to PPI networks obtained with different experimental systems, such as affinity purification, yeast two-hybrid (Y2H) and protein-fragment complementation assay (PCA), and evidence shows that the predicted edges are likely bona fide physical interactions. Finally, an application to a human PPI network increased the coverage of the network by at least 100%. AVAILABILITY www.cs.utsa.edu/∼jruan/RWS/.
Collapse
Affiliation(s)
- Chengwei Lei
- Department of Computer Science, The University of Texas at San Antonio, San Antonio, TX 78249, USA
| | | |
Collapse
|
26
|
Eronen L, Toivonen H. Biomine: predicting links between biological entities using network models of heterogeneous databases. BMC Bioinformatics 2012; 13:119. [PMID: 22672646 PMCID: PMC3505483 DOI: 10.1186/1471-2105-13-119] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2011] [Accepted: 04/17/2012] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Biological databases contain large amounts of data concerning the functions and associations of genes and proteins. Integration of data from several such databases into a single repository can aid the discovery of previously unknown connections spanning multiple types of relationships and databases. RESULTS Biomine is a system that integrates cross-references from several biological databases into a graph model with multiple types of edges, such as protein interactions, gene-disease associations and gene ontology annotations. Edges are weighted based on their type, reliability, and informativeness. We present Biomine and evaluate its performance in link prediction, where the goal is to predict pairs of nodes that will be connected in the future, based on current data. In particular, we formulate protein interaction prediction and disease gene prioritization tasks as instances of link prediction. The predictions are based on a proximity measure computed on the integrated graph. We consider and experiment with several such measures, and perform a parameter optimization procedure where different edge types are weighted to optimize link prediction accuracy. We also propose a novel method for disease-gene prioritization, defined as finding a subset of candidate genes that cluster together in the graph. We experimentally evaluate Biomine by predicting future annotations in the source databases and prioritizing lists of putative disease genes. CONCLUSIONS The experimental results show that Biomine has strong potential for predicting links when a set of selected candidate links is available. The predictions obtained using the entire Biomine dataset are shown to clearly outperform ones obtained using any single source of data alone, when different types of links are suitably weighted. In the gene prioritization task, an established reference set of disease-associated genes is useful, but the results show that under favorable conditions, Biomine can also perform well when no such information is available.The Biomine system is a proof of concept. Its current version contains 1.1 million entities and 8.1 million relations between them, with focus on human genetics. Some of its functionalities are available in a public query interface at http://biomine.cs.helsinki.fi, allowing searching for and visualizing connections between given biological entities.
Collapse
Affiliation(s)
- Lauri Eronen
- Biocomputing Platforms Ltd, Innopoli 2, Tekniikantie 14, , FI-02150 Espoo, Finland.
| | | |
Collapse
|
27
|
Hallinan JS, James K, Wipat A. Network approaches to the functional analysis of microbial proteins. Adv Microb Physiol 2011; 59:101-33. [PMID: 22114841 DOI: 10.1016/b978-0-12-387661-4.00005-7] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
Abstract
Large amounts of detailed biological data have been generated over the past few decades. Much of these data is freely available in over 1000 online databases; an enticing, but frustrating resource for microbiologists interested in a systems-level view of the structure and function of microbial cells. The frustration engendered by the need to trawl manually through hundreds of databases in order to accumulate information about a gene, protein, pathway, or organism of interest can be alleviated by the use of computational data integration to generated network views of the system of interest. Biological networks can be constructed from a single type of data, such as protein-protein binding information, or from data generated by multiple experimental approaches. In an integrated network, nodes usually represent genes or gene products, while edges represent some form of interaction between the nodes. Edges between nodes may be weighted to represent the probability that the edge exists in vivo. Networks may also be enriched with ontological annotations, facilitating both visual browsing and computational analysis via web service interfaces. In this review, we describe the construction, analysis of both single-data source and integrated networks, and their application to the inference of protein function in microbes.
Collapse
Affiliation(s)
- J S Hallinan
- School of Computing Science, Newcastle University, Newcastle, UK
| | | | | |
Collapse
|
28
|
CHUA HONNIAN, NING KANG, SUNG WINGKIN, LEONG HONWAI, WONG LIMSOON. USING INDIRECT PROTEIN–PROTEIN INTERACTIONS FOR PROTEIN COMPLEX PREDICTION. J Bioinform Comput Biol 2011; 6:435-66. [DOI: 10.1142/s0219720008003497] [Citation(s) in RCA: 109] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2007] [Revised: 12/01/2007] [Accepted: 01/03/2008] [Indexed: 11/18/2022]
Abstract
Protein complexes are fundamental for understanding principles of cellular organizations. As the sizes of protein–protein interaction (PPI) networks are increasing, accurate and fast protein complex prediction from these PPI networks can serve as a guide for biological experiments to discover novel protein complexes. However, it is not easy to predict protein complexes from PPI networks, especially in situations where the PPI network is noisy and still incomplete. Here, we study the use of indirect interactions between level-2 neighbors (level-2 interactions) for protein complex prediction. We know from previous work that proteins which do not interact but share interaction partners (level-2 neighbors) often share biological functions. We have proposed a method in which all direct and indirect interactions are first weighted using topological weight (FS-Weight), which estimates the strength of functional association. Interactions with low weight are removed from the network, while level-2 interactions with high weight are introduced into the interaction network. Existing clustering algorithms can then be applied to this modified network. We have also proposed a novel algorithm that searches for cliques in the modified network, and merge cliques to form clusters using a "partial clique merging" method. Experiments show that (1) the use of indirect interactions and topological weight to augment protein–protein interactions can be used to improve the precision of clusters predicted by various existing clustering algorithms; and (2) our complex-finding algorithm performs very well on interaction networks modified in this way. Since no other information except the original PPI network is used, our approach would be very useful for protein complex prediction, especially for prediction of novel protein complexes.
Collapse
Affiliation(s)
- HON NIAN CHUA
- Graduate School of Integrated Sciences, National University of Singapore, Singapore
| | - KANG NING
- Department of Computer Science, National University of Singapore, Singapore
| | - WING-KIN SUNG
- Department of Computer Science, National University of Singapore, Singapore
| | - HON WAI LEONG
- Department of Computer Science, National University of Singapore, Singapore
| | - LIMSOON WONG
- Department of Computer Science, National University of Singapore, Singapore
| |
Collapse
|
29
|
Kim EDH, Sabharwal A, Vetta AR, Blanchette M. Predicting direct protein interactions from affinity purification mass spectrometry data. Algorithms Mol Biol 2010; 5:34. [PMID: 21034440 PMCID: PMC2991326 DOI: 10.1186/1748-7188-5-34] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2010] [Accepted: 10/29/2010] [Indexed: 01/02/2023] Open
Abstract
Background Affinity purification followed by mass spectrometry identification (AP-MS) is an increasingly popular approach to observe protein-protein interactions (PPI) in vivo. One drawback of AP-MS, however, is that it is prone to detecting indirect interactions mixed with direct physical interactions. Therefore, the ability to distinguish direct interactions from indirect ones is of much interest. Results We first propose a simple probabilistic model for the interactions captured by AP-MS experiments, under which the problem of separating direct interactions from indirect ones is formulated. Then, given idealized quantitative AP-MS data, we study the problem of identifying the most likely set of direct interactions that produced the observed data. We address this challenging graph theoretical problem by first characterizing signatures that can identify weakly connected nodes as well as dense regions of the network. The rest of the direct PPI network is then inferred using a genetic algorithm. Our algorithm shows good performance on both simulated and biological networks with very high sensitivity and specificity. Then the algorithm is used to predict direct interactions from a set of AP-MS PPI data from yeast, and its performance is measured against a high-quality interaction dataset. Conclusions As the sensitivity of AP-MS pipeline improves, the fraction of indirect interactions detected will also increase, thereby making the ability to distinguish them even more desirable. Despite the simplicity of our model for indirect interactions, our method provides a good performance on the test networks.
Collapse
|
30
|
Kaake RM, Wang X, Huang L. Profiling of protein interaction networks of protein complexes using affinity purification and quantitative mass spectrometry. Mol Cell Proteomics 2010; 9:1650-65. [PMID: 20445003 DOI: 10.1074/mcp.r110.000265] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
Protein-protein interactions are important for nearly all biological processes, and it is known that aberrant protein-protein interactions can lead to human disease and cancer. Recent evidence has suggested that protein interaction interfaces describe a new class of attractive targets for drug development. Full characterization of protein interaction networks of protein complexes and their dynamics in response to various cellular cues will provide essential information for us to understand how protein complexes work together in cells to maintain cell viability and normal homeostasis. Affinity purification coupled with quantitative mass spectrometry has become the primary method for studying in vivo protein interactions of protein complexes and whole organism proteomes. Recent developments in sample preparation and affinity purification strategies allow the capture, identification, and quantification of protein interactions of protein complexes that are stable, dynamic, transient, and/or weak. Current efforts have mainly focused on generating reliable, reproducible, and high confidence protein interaction data sets for functional characterization. The availability of increasing amounts of information on protein interactions in eukaryotic systems and new bioinformatics tools allow functional analysis of quantitative protein interaction data to unravel the biological significance of the identified protein interactions. Existing studies in this area have laid a solid foundation toward generating a complete map of in vivo protein interaction networks of protein complexes in cells or tissues.
Collapse
Affiliation(s)
- Robyn M Kaake
- Department of Physiology and Biophysics, University of California, Irvine, California 92697-4560, USA
| | | | | |
Collapse
|
31
|
From Experimental Approaches to Computational Techniques: A Review on the Prediction of Protein-Protein Interactions. ACTA ACUST UNITED AC 2010. [DOI: 10.1155/2010/924529] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
A crucial step towards understanding the properties of cellular systems in organisms is to map their network of protein-protein interactions (PPIs) on a proteomic-wide scale completely and as accurately as possible. Uncovering the diverse function of proteins and their interactions within the cell may improve our understanding of disease and provide a basis for the development of novel therapeutic approaches. The development of large-scale high-throughput experiments has resulted in the production of a large volume of data which has aided in the uncovering of PPIs. However, these data are often erroneous and limited in interactome coverage. Therefore, additional experimental and computational methods are required to accelerate the discovery of PPIs. This paper provides a review on the prediction of PPIs addressing key prediction principles and highlighting the common experimental and computational techniques currently employed to infer PPI networks along with relevant studies in the area.
Collapse
|
32
|
Berwick DC, Diss JKJ, Budhram-Mahadeo VS, Latchman DS. A simple technique for the prediction of interacting proteins reveals a direct Brn-3a-androgen receptor interaction. J Biol Chem 2010; 285:15286-15295. [PMID: 20228055 DOI: 10.1074/jbc.m109.071456] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022] Open
Abstract
The formation of multiprotein complexes constitutes a key step in determining the function of any translated gene product. Thus, the elucidation of interacting partners for a protein of interest is of fundamental importance to cell biology. Here we describe a simple methodology for the prediction of novel interactors. We have applied this to the developmental transcription factor Brn-3a to predict and verify a novel interaction between Brn-3a and the androgen receptor (AR). We demonstrate that these transcription factors form complexes within the nucleus of ND7 neuroblastoma cells, while in vitro pull-down assays show direct association. As a functional consequence of the Brn-3a-AR interaction, the factors bind cooperatively to multiple elements within the promoter of the voltage-gated sodium channel, Nav1.7, leading to a synergistic increase in its expression. Thus, these data define AR as a direct Brn-3a interactor and verify a simple interacting protein prediction methodology that is likely to be useful for many other proteins.
Collapse
Affiliation(s)
- Daniel C Berwick
- Medical Molecular Biology Unit, University College London Institute of Child Health, 30 Guilford Street, London WC1N 1EH, United Kingdom.
| | - James K J Diss
- Medical Molecular Biology Unit, University College London Institute of Child Health, 30 Guilford Street, London WC1N 1EH, United Kingdom
| | - Vishwanie S Budhram-Mahadeo
- Medical Molecular Biology Unit, University College London Institute of Child Health, 30 Guilford Street, London WC1N 1EH, United Kingdom
| | - David S Latchman
- Medical Molecular Biology Unit, University College London Institute of Child Health, 30 Guilford Street, London WC1N 1EH, United Kingdom; Birkbeck, University of London, Malet Street, London WC1E 7HX, United Kingdom
| |
Collapse
|
33
|
Voevodski K, Teng SH, Xia Y. Spectral affinity in protein networks. BMC SYSTEMS BIOLOGY 2009; 3:112. [PMID: 19943959 PMCID: PMC2797010 DOI: 10.1186/1752-0509-3-112] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/12/2009] [Accepted: 11/29/2009] [Indexed: 01/15/2023]
Abstract
Background Protein-protein interaction (PPI) networks enable us to better understand the functional organization of the proteome. We can learn a lot about a particular protein by querying its neighborhood in a PPI network to find proteins with similar function. A spectral approach that considers random walks between nodes of interest is particularly useful in evaluating closeness in PPI networks. Spectral measures of closeness are more robust to noise in the data and are more precise than simpler methods based on edge density and shortest path length. Results We develop a novel affinity measure for pairs of proteins in PPI networks, which uses personalized PageRank, a random walk based method used in context-sensitive search on the Web. Our measure of closeness, which we call PageRank Affinity, is proportional to the number of times the smaller-degree protein is visited in a random walk that restarts at the larger-degree protein. PageRank considers paths of all lengths in a network, therefore PageRank Affinity is a precise measure that is robust to noise in the data. PageRank Affinity is also provably related to cluster co-membership, making it a meaningful measure. In our experiments on protein networks we find that our measure is better at predicting co-complex membership and finding functionally related proteins than other commonly used measures of closeness. Moreover, our experiments indicate that PageRank Affinity is very resilient to noise in the network. In addition, based on our method we build a tool that quickly finds nodes closest to a queried protein in any protein network, and easily scales to much larger biological networks. Conclusion We define a meaningful way to assess the closeness of two proteins in a PPI network, and show that our closeness measure is more biologically significant than other commonly used methods. We also develop a tool, accessible at http://xialab.bu.edu/resources/pnns, that allows the user to quickly find nodes closest to a queried vertex in any protein network available from BioGRID or specified by the user.
Collapse
|
34
|
|
35
|
Song J, Singh M. How and when should interactome-derived clusters be used to predict functional modules and protein function? ACTA ACUST UNITED AC 2009; 25:3143-50. [PMID: 19770263 PMCID: PMC3167697 DOI: 10.1093/bioinformatics/btp551] [Citation(s) in RCA: 83] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: Clustering of protein–protein interaction networks is one of the most common approaches for predicting functional modules, protein complexes and protein functions. But, how well does clustering perform at these tasks? Results: We develop a general framework to assess how well computationally derived clusters in physical interactomes overlap functional modules derived via the Gene Ontology (GO). Using this framework, we evaluate six diverse network clustering algorithms using Saccharomyces cerevisiae and show that (i) the performances of these algorithms can differ substantially when run on the same network and (ii) their relative performances change depending upon the topological characteristics of the network under consideration. For the specific task of function prediction in S.cerevisiae, we demonstrate that, surprisingly, a simple non-clustering guilt-by-association approach outperforms widely used clustering-based approaches that annotate a protein with the overrepresented biological process and cellular component terms in its cluster; this is true over the range of clustering algorithms considered. Further analysis parameterizes performance based on the number of annotated proteins, and suggests when clustering approaches should be used for interactome functional analyses. Overall our results suggest a re-examination of when and how clustering approaches should be applied to physical interactomes, and establishes guidelines by which novel clustering approaches for biological networks should be justified and evaluated with respect to functional analysis. Contact:msingh@cs.princeton.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jimin Song
- Department of Computer Science & Lewis-Sigler Institute for Integrative Genomics Princeton University, Princeton, NJ 08544, USA
| | | |
Collapse
|
36
|
Macropol K, Can T, Singh AK. RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics 2009; 10:283. [PMID: 19740439 PMCID: PMC2748087 DOI: 10.1186/1471-2105-10-283] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 09/09/2009] [Indexed: 03/24/2023] Open
Abstract
BACKGROUND We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. RESULTS We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. CONCLUSION RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.
Collapse
Affiliation(s)
- Kathy Macropol
- Department of Computer Science, University of California, Santa Barbara, CA 93106, USA.
| | | | | |
Collapse
|
37
|
Hu X, Ng M, Wu FX, Sokhansanj BA. Mining, modeling, and evaluation of subnetworks from large biomolecular networks and its comparison study. ACTA ACUST UNITED AC 2009; 13:184-94. [PMID: 19272861 DOI: 10.1109/titb.2008.2007649] [Citation(s) in RCA: 14] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
In this paper, we present a novel method to mine, model, and evaluate a regulatory system executing cellular functions that can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to such a biomolecular network to obtain various subnetworks. Second, computational models are generated for the subnetworks and simulated to predict their behavior in the cellular context. We discuss and evaluate some of the advanced computational modeling approaches, in particular, state-space modeling, probabilistic Boolean network modeling, and fuzzy logic modeling. The modeling and simulation results represent hypotheses that are tested against high-throughput biological datasets (microarrays and/or genetic screens) under normal and perturbation conditions. Experimental results on time-series gene expression data for the human cell cycle indicate that our approach is promising for subnetwork mining and simulation from large biomolecular networks.
Collapse
Affiliation(s)
- Xiaohua Hu
- College of Information Science and Technology, Drexel University, Philadelphia, PA 19104, USA.
| | | | | | | |
Collapse
|
38
|
Chen JY, Mamidipalli S, Huan T. HAPPI: an online database of comprehensive human annotated and predicted protein interactions. BMC Genomics 2009; 10 Suppl 1:S16. [PMID: 19594875 PMCID: PMC2709259 DOI: 10.1186/1471-2164-10-s1-s16] [Citation(s) in RCA: 103] [Impact Index Per Article: 6.9] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
Background Human protein-protein interaction (PPIs) data are the foundation for understanding molecular signalling networks and the functional roles of biomolecules. Several human PPI databases have become available; however, comparisons of these datasets have suggested limited data coverage and poor data quality. Ongoing collection and integration of human PPIs from different sources, both experimentally and computationally, can enable disease-specific network biology modelling in translational bioinformatics studies. Results We developed a new web-based resource, the Human Annotated and Predicted Protein Interaction (HAPPI) database, located at . The HAPPI database was created by extracting and integrating publicly available protein interaction databases, including HPRD, BIND, MINT, STRING, and OPHID, using database integration techniques. We designed a unified entity-relationship data model to resolve semantic level differences of diverse concepts involved in PPI data integration. We applied a unified scoring model to give each PPI a measure of its reliability that can place each PPI at one of the five star rank levels from 1 to 5. We assessed the quality of PPIs contained in the new HAPPI database, using evolutionary conserved co-expression pairs called "MetaGene" pairs to measure the extent of MetaGene pair and PPI pair overlaps. While the overall quality of the HAPPI database across all star ranks is comparable to the overall qualities of HPRD or IntNetDB, the subset of the HAPPI database with star ranks between 3 and 5 has a much higher average quality than all other human PPI databases. As of summer 2008, the database contains 142,956 non-redundant, medium to high-confidence level human protein interaction pairs among 10,592 human proteins. The HAPPI database web application also provides …” should be “The HAPPI database web application also provides hyperlinked information of genes, pathways, protein domains, protein structure displays, and sequence feature maps for interactive exploration of PPI data in the database. Conclusion HAPPI is by far the most comprehensive public compilation of human protein interaction information. It enables its users to fully explore PPI data with quality measures and annotated information necessary for emerging network biology studies.
Collapse
Affiliation(s)
- Jake Yue Chen
- School of Informatics, Indiana University - Purdue University, Indianapolis, IN, USA.
| | | | | |
Collapse
|
39
|
Gao L, Sun PG, Song J. Clustering algorithms for detecting functional modules in protein interaction networks. J Bioinform Comput Biol 2009; 7:217-42. [PMID: 19226668 DOI: 10.1142/s0219720009004023] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2008] [Revised: 10/21/2008] [Accepted: 10/21/2008] [Indexed: 01/21/2023]
Abstract
Protein-Protein Interaction (PPI) networks are believed to be important sources of information related to biological processes and complex metabolic functions of the cell. When studying the workings of a biological cell, it is useful to be able to detect known and predict still undiscovered protein complexes within the cell's PPI networks. Such predictions may be used as an inexpensive tool to direct biological experiments. The increasing amount of available PPI data necessitate a fast, accurate approach to biological complex identification. Because of its importance in the studies of protein interaction network, there are different models and algorithms in identifying functional modules in PPI networks. In this paper, we review some representative algorithms, focusing on the algorithms underlying the approaches and how the algorithms relate to each other. In particular, a comparison is given based on the property of the algorithms. Since the PPI network is noisy and still incomplete, some methods which consider other additional properties for preprocessing and purifying of PPI data are presented. We also give a discussion about the functional annotation and validation of protein complexes. Finally, new progress and future research directions are discussed from the computational viewpoint.
Collapse
Affiliation(s)
- Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an 710071, China.
| | | | | |
Collapse
|
40
|
Leach SM, Tipney H, Feng W, Baumgartner WA, Kasliwal P, Schuyler RP, Williams T, Spritz RA, Hunter L. Biomedical discovery acceleration, with applications to craniofacial development. PLoS Comput Biol 2009; 5:e1000215. [PMID: 19325874 PMCID: PMC2653649 DOI: 10.1371/journal.pcbi.1000215] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2008] [Accepted: 02/12/2009] [Indexed: 01/17/2023] Open
Abstract
The profusion of high-throughput instruments and the explosion of new results in the scientific literature, particularly in molecular biomedicine, is both a blessing and a curse to the bench researcher. Even knowledgeable and experienced scientists can benefit from computational tools that help navigate this vast and rapidly evolving terrain. In this paper, we describe a novel computational approach to this challenge, a knowledge-based system that combines reading, reasoning, and reporting methods to facilitate analysis of experimental data. Reading methods extract information from external resources, either by parsing structured data or using biomedical language processing to extract information from unstructured data, and track knowledge provenance. Reasoning methods enrich the knowledge that results from reading by, for example, noting two genes that are annotated to the same ontology term or database entry. Reasoning is also used to combine all sources into a knowledge network that represents the integration of all sorts of relationships between a pair of genes, and to calculate a combined reliability score. Reporting methods combine the knowledge network with a congruent network constructed from experimental data and visualize the combined network in a tool that facilitates the knowledge-based analysis of that data. An implementation of this approach, called the Hanalyzer, is demonstrated on a large-scale gene expression array dataset relevant to craniofacial development. The use of the tool was critical in the creation of hypotheses regarding the roles of four genes never previously characterized as involved in craniofacial development; each of these hypotheses was validated by further experimental work.
Collapse
Affiliation(s)
- Sonia M. Leach
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Hannah Tipney
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Weiguo Feng
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - William A. Baumgartner
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Priyanka Kasliwal
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Ronald P. Schuyler
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Trevor Williams
- Department of Craniofacial Biology, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Richard A. Spritz
- Human Medical Genetics Program, University of Colorado at Denver, Denver, Colorado, United States of America
| | - Lawrence Hunter
- Center for Computational Pharmacology, University of Colorado at Denver, Denver, Colorado, United States of America
- * E-mail:
| |
Collapse
|
41
|
Accurate and Scalable Techniques for the Complex/Pathway Membership Problem in Protein Networks. Adv Bioinformatics 2009:787128. [PMID: 20182643 PMCID: PMC2826754 DOI: 10.1155/2009/787128] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2009] [Accepted: 12/02/2009] [Indexed: 11/17/2022] Open
Abstract
A protein network shows physical interactions as well as functional associations. An important
usage of such networks is to discover unknown members of partially known complexes and
pathways. A number of methods exist for such analyses, and they can be divided into two main
categories based on their treatment of highly connected proteins. In this paper, we show that
methods that are not affected by the degree (number of linkages) of a protein give more accurate
predictions for certain complexes and pathways. We propose a network flow-based technique
to compute the association probability of a pair of proteins. We extend the proposed technique
using hierarchical clustering in order to scale well with the size of proteome. We also show that
top-k queries are not suitable for a large number of cases, and threshold queries are more meaningful
in these cases. Network flow technique with clustering is able to optimize meaningful
threshold queries and answer them with high efficiency compared to a similar method that uses
Monte Carlo simulation.
Collapse
|
42
|
Marín I, Hoyas S. Basic networks: definition and applications. J Theor Biol 2009; 258:53-9. [PMID: 19490867 DOI: 10.1016/j.jtbi.2009.01.022] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/18/2008] [Revised: 01/21/2009] [Accepted: 01/21/2009] [Indexed: 10/21/2022]
Abstract
We define basic networks as the undirected subgraphs with minimal number of units in which the distances (geodesics, minimal path lengths) among a set of selected nodes, which we call seeds, in the original graph are conserved. The additional nodes required to draw the basic network are called connectors. We describe a heuristic strategy to find the basic networks of complex graphs. We also show how the characterization of these networks may help to obtain relevant biological information from highly complex protein-protein interaction data.
Collapse
Affiliation(s)
- Ignacio Marín
- Instituto de Biomedicina de Valencia, Consejo Superior de Investigaciones Científicas (IBV-CSIC), Calle Jaime Roig, 11, Valencia 46010, Spain.
| | | |
Collapse
|
43
|
Ma'ayan A. Network integration and graph analysis in mammalian molecular systems biology. IET Syst Biol 2009; 2:206-21. [PMID: 19045817 DOI: 10.1049/iet-syb:20070075] [Citation(s) in RCA: 36] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
Abstraction of intracellular biomolecular interactions into networks is useful for data integration and graph analysis. Network analysis tools facilitate predictions of novel functions for proteins, prediction of functional interactions and identification of intracellular modules. These efforts are linked with drug and phenotype data to accelerate drug-target and biomarker discovery. This review highlights the currently available varieties of mammalian biomolecular networks, and surveys methods and tools to construct, compare, integrate, visualise and analyse such networks.
Collapse
Affiliation(s)
- A Ma'ayan
- Mount Sinai School of Medicine, Department of Pharmacology and Systems Therapeutics, New York, NY 10029-6574, USA.
| |
Collapse
|
44
|
Yu J, Finley RL. Combining multiple positive training sets to generate confidence scores for protein-protein interactions. ACTA ACUST UNITED AC 2008; 25:105-11. [PMID: 19010802 PMCID: PMC2638943 DOI: 10.1093/bioinformatics/btn597] [Citation(s) in RCA: 44] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Motivation: High-throughput experimental and computational methods are generating a wealth of protein–protein interaction data for a variety of organisms. However, data produced by current state-of-the-art methods include many false positives, which can hinder the analyses needed to derive biological insights. One way to address this problem is to assign confidence scores that reflect the reliability and biological significance of each interaction. Most previously described scoring methods use a set of likely true positives to train a model to score all interactions in a dataset. A single positive training set, however, may be biased and not representative of true interaction space. Results: We demonstrate a method to score protein interactions by utilizing multiple independent sets of training positives to reduce the potential bias inherent in using a single training set. We used a set of benchmark yeast protein interactions to show that our approach outperforms other scoring methods. Our approach can also score interactions across data types, which makes it more widely applicable than many previously proposed methods. We applied the method to protein interaction data from both Drosophila melanogaster and Homo sapiens. Independent evaluations show that the resulting confidence scores accurately reflect the biological significance of the interactions. Contact:rfinley@wayne.edu Supplementary information:Supplementary data are available at Bioinformatics Online.
Collapse
Affiliation(s)
- Jingkai Yu
- Center for Molecular Medicine and Genetics and Department of Biochemistry and Molecular Biology, School of Medicine, Wayne State University, 540 East Canfield, Detroit, MI 48201, USA
| | | |
Collapse
|
45
|
Qi Y, Suhail Y, Lin YY, Boeke JD, Bader JS. Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. Genome Res 2008; 18:1991-2004. [PMID: 18832443 DOI: 10.1101/gr.077693.108] [Citation(s) in RCA: 76] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
Abstract
The yeast synthetic lethal genetic interaction network contains rich information about underlying pathways and protein complexes as well as new genetic interactions yet to be discovered. We have developed a graph diffusion kernel as a unified framework for inferring complex/pathway membership analogous to "friends" and genetic interactions analogous to "enemies" from the genetic interaction network. When applied to the Saccharomyces cerevisiae synthetic lethal genetic interaction network, we can achieve a precision around 50% with 20% to 50% recall in the genome-wide prediction of new genetic interactions, supported by experimental validation. The kernels show significant improvement over previous best methods for predicting genetic interactions and protein co-complex membership from genetic interaction data.
Collapse
Affiliation(s)
- Yan Qi
- Department of Biomedical Engineering, Johns Hopkins University, Baltimore, Maryland 21218, USA
| | | | | | | | | |
Collapse
|
46
|
Identifying components of complexes. Methods Mol Biol 2008. [PMID: 18712308 DOI: 10.1007/978-1-60327-429-6_13] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register]
Abstract
Identifying and analyzing components of complexes is essential to understand the activities and organization of the cell. Moreover, it provides additional information on the possible function of proteins involved in these complexes. Two bioinformatics approaches are usually used for this purpose. The first is based on the identification, by clustering algorithms, of full or densely connected sub-graphs in protein-protein interaction networks derived from experimental sources that might represent complexes. The second approach consists of the integration of genomic and proteomic data by using Bayesian networks or decision trees. This approach is based on the hypothesis that proteins involved in a complex usually share common properties.
Collapse
|
47
|
Turanalp ME, Can T. Discovering functional interaction patterns in protein-protein interaction networks. BMC Bioinformatics 2008; 9:276. [PMID: 18547430 PMCID: PMC2442100 DOI: 10.1186/1471-2105-9-276] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2007] [Accepted: 06/11/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND In recent years, a considerable amount of research effort has been directed to the analysis of biological networks with the availability of genome-scale networks of genes and/or proteins of an increasing number of organisms. A protein-protein interaction (PPI) network is a particular biological network which represents physical interactions between pairs of proteins of an organism. Major research on PPI networks has focused on understanding the topological organization of PPI networks, evolution of PPI networks and identification of conserved subnetworks across different species, discovery of modules of interaction, use of PPI networks for functional annotation of uncharacterized proteins, and improvement of the accuracy of currently available networks. RESULTS In this article, we map known functional annotations of proteins onto a PPI network in order to identify frequently occurring interaction patterns in the functional space. We propose a new frequent pattern identification technique, PPISpan, adapted specifically for PPI networks from a well-known frequent subgraph identification method, gSpan. Existing module discovery techniques either look for specific clique-like highly interacting protein clusters or linear paths of interaction. However, our goal is different; instead of single clusters or pathways, we look for recurring functional interaction patterns in arbitrary topologies. We have applied PPISpan on PPI networks of Saccharomyces cerevisiae and identified a number of frequently occurring functional interaction patterns. CONCLUSION With the help of PPISpan, recurring functional interaction patterns in an organism's PPI network can be identified. Such an analysis offers a new perspective on the modular organization of PPI networks. The complete list of identified functional interaction patterns is available at http://bioserver.ceng.metu.edu.tr/PPISpan/.
Collapse
Affiliation(s)
- Mehmet E Turanalp
- Department of Computer Engineering, Selcuk University, Alaaddin Keykubat Kampusu, 42075 Selcuklu, Konya, Turkey.
| | | |
Collapse
|
48
|
Linghu B, Snitkin ES, Holloway DT, Gustafson AM, Xia Y, DeLisi C. High-precision high-coverage functional inference from integrated data sources. BMC Bioinformatics 2008; 9:119. [PMID: 18298847 PMCID: PMC2292694 DOI: 10.1186/1471-2105-9-119] [Citation(s) in RCA: 25] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2007] [Accepted: 02/25/2008] [Indexed: 11/15/2022] Open
Abstract
Background Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation. Results We first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms. Conclusion We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.
Collapse
Affiliation(s)
- Bolan Linghu
- Bioinformatics Graduate Program, Boston University, Boston, MA, 02215, USA.
| | | | | | | | | | | |
Collapse
|
49
|
Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics. Proc Natl Acad Sci U S A 2008; 105:1454-9. [PMID: 18218781 DOI: 10.1073/pnas.0706983105] [Citation(s) in RCA: 196] [Impact Index Per Article: 12.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023] Open
Abstract
Large-scale affinity purification and mass spectrometry studies have played important roles in the assembly and analysis of comprehensive protein interaction networks for lower eukaryotes. However, the development of such networks for human proteins has been slowed by the high cost and significant technical challenges associated with systematic studies of protein interactions. To address this challenge, we have developed a method for building local and focused networks. This approach couples vector algebra and statistical methods with normalized spectral counting (NSAF) derived from the analysis of affinity purifications via chromatography-based proteomics. After mathematical removal of contaminant proteins, the core components of multiprotein complexes are determined by singular value decomposition analysis and clustering. The probability of interactions within and between complexes is computed solely based upon NSAFs using Bayes' approach. To demonstrate the application of this method to small-scale datasets, we analyzed an expanded human TIP49a and TIP49b dataset. This dataset contained proteins affinity-purified with 27 different epitope-tagged components of the chromatin remodeling SRCAP, hINO80, and TRRAP/TIP60 complexes, and the nutrient sensing complex Uri/Prefoldin. Within a core network of 65 unique proteins, we captured all known components of these complexes and novel protein associations, especially in the Uri/Prefoldin complex. Finally, we constructed a probabilistic human interaction network composed of 557 protein pairs.
Collapse
|
50
|
Hu X, Wu FX. Mining and state-space modeling and verification of sub-networks from large-scale biomolecular networks. BMC Bioinformatics 2007; 8:324. [PMID: 17764552 PMCID: PMC2213691 DOI: 10.1186/1471-2105-8-324] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2007] [Accepted: 08/31/2007] [Indexed: 11/13/2022] Open
Abstract
Background Biomolecular networks dynamically respond to stimuli and implement cellular function. Understanding these dynamic changes is the key challenge for cell biologists. As biomolecular networks grow in size and complexity, the model of a biomolecular network must become more rigorous to keep track of all the components and their interactions. In general this presents the need for computer simulation to manipulate and understand the biomolecular network model. Results In this paper, we present a novel method to model the regulatory system which executes a cellular function and can be represented as a biomolecular network. Our method consists of two steps. First, a novel scale-free network clustering approach is applied to the large-scale biomolecular network to obtain various sub-networks. Second, a state-space model is generated for the sub-networks and simulated to predict their behavior in the cellular context. The modeling results represent hypotheses that are tested against high-throughput data sets (microarrays and/or genetic screens) for both the natural system and perturbations. Notably, the dynamic modeling component of this method depends on the automated network structure generation of the first component and the sub-network clustering, which are both essential to make the solution tractable. Conclusion Experimental results on time series gene expression data for the human cell cycle indicate our approach is promising for sub-network mining and simulation from large-scale biomolecular network.
Collapse
Affiliation(s)
- Xiaohua Hu
- College of Information Science & Technology, Drexel University, Philadelphia, PA 19104, USA
| | - Fang-Xiang Wu
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada
| |
Collapse
|