1
|
Pan Y, Wang Y, Guan J, Zhou S. PCGAN: a generative approach for protein complex identification from protein interaction networks. Bioinformatics 2023; 39:btad473. [PMID: 37531266 PMCID: PMC10457665 DOI: 10.1093/bioinformatics/btad473] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2022] [Revised: 07/23/2023] [Accepted: 08/01/2023] [Indexed: 08/04/2023] Open
Abstract
MOTIVATION Protein complexes are groups of polypeptide chains linked by non-covalent protein-protein interactions, which play important roles in biological systems and perform numerous functions, including DNA transcription, mRNA translation, and signal transduction. In the past decade, a number of computational methods have been developed to identify protein complexes from protein interaction networks by mining dense subnetworks or subgraphs. RESULTS In this article, different from the existing works, we propose a novel approach for this task based on generative adversarial networks, which is called PCGAN, meaning identifying Protein Complexes by GAN. With the help of some real complexes as training samples, our method can learn a model to generate new complexes from a protein interaction network. To effectively support model training and testing, we construct two more comprehensive and reliable protein interaction networks and a larger gold standard complex set by merging existing ones of the same organism (including human and yeast). Extensive comparison studies indicate that our method is superior to existing protein complex identification methods in terms of various performance metrics. Furthermore, functional enrichment analysis shows that the identified complexes are of high biological significance, which indicates that these generated protein complexes are very possibly real complexes. AVAILABILITY AND IMPLEMENTATION https://github.com/yul-pan/PCGAN.
Collapse
Affiliation(s)
- Yuliang Pan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Yang Wang
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, Shanghai 201804, China
| | - Shuigeng Zhou
- Shanghai Key Laboratory of Intelligent Information Processing, and School of Computer Science, Fudan University, Shanghai 200438, China
| |
Collapse
|
2
|
Viljanen M, Airola A, Pahikkala T. Generalized vec trick for fast learning of pairwise kernel models. Mach Learn 2022. [DOI: 10.1007/s10994-021-06127-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
Abstract
AbstractPairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects. Specifically, we consider the standard, symmetric and anti-symmetric Kronecker product kernels, metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian kernels. Recently, a $$O(nm+nq)$$
O
(
n
m
+
n
q
)
time generalized vec trick algorithm, where $$n$$
n
, $$m$$
m
, and $$q$$
q
denote the number of pairs, drugs and targets, was introduced for training kernel methods with the Kronecker product kernel. This was a significant improvement over previous $$O(n^2)$$
O
(
n
2
)
training methods, since in most real-world applications $$m,q<< n$$
m
,
q
<
<
n
. In this work we show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation. In the experiments, we demonstrate how the introduced approach allows scaling pairwise kernels to much larger data sets than previously feasible, and provide an extensive comparison of the kernels on a number of biological interaction prediction tasks.
Collapse
|
3
|
OUP accepted manuscript. Brief Funct Genomics 2022; 21:243-269. [DOI: 10.1093/bfgp/elac007] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 03/17/2022] [Accepted: 03/18/2022] [Indexed: 11/14/2022] Open
|
4
|
Wang R, Ma H, Wang C. An Improved Memetic Algorithm for Detecting Protein Complexes in Protein Interaction Networks. Front Genet 2022; 12:794354. [PMID: 34970305 PMCID: PMC8712950 DOI: 10.3389/fgene.2021.794354] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2021] [Accepted: 11/22/2021] [Indexed: 11/13/2022] Open
Abstract
Identifying the protein complexes in protein-protein interaction (PPI) networks is essential for understanding cellular organization and biological processes. To address the high false positive/negative rates of PPI networks and detect protein complexes with multiple topological structures, we developed a novel improved memetic algorithm (IMA). IMA first combines the topological and biological properties to obtain a weighted PPI network with reduced noise. Next, it integrates various clustering results to construct the initial populations. Furthermore, a fitness function is designed based on the five topological properties of the protein complexes. Finally, we describe the rest of our IMA method, which primarily consists of four steps: selection operator, recombination operator, local optimization strategy, and updating the population operator. In particular, IMA is a combination of genetic algorithm and a local optimization strategy, which has a strong global search ability, and searches for local optimal solutions effectively. The experimental results demonstrate that IMA performs much better than the base methods and existing state-of-the-art techniques. The source code and datasets of the IMA can be found at https://github.com/RongquanWang/IMA.
Collapse
Affiliation(s)
- Rongquan Wang
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Huimin Ma
- School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
| | - Caixia Wang
- School of International Economics, China Foreign Affairs University, Beijing, China
| |
Collapse
|
5
|
Yao H, Guan J, Liu T. Denoising Protein-Protein interaction network via variational graph auto-encoder for protein complex detection. J Bioinform Comput Biol 2021; 18:2040010. [PMID: 32698725 DOI: 10.1142/s0219720020400107] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
Identifying protein complexes is an important issue in computational biology, as it benefits the understanding of cellular functions and the design of drugs. In the past decades, many computational methods have been proposed by mining dense subgraphs in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents accurately detecting complexes directly from the raw PINs. In this paper, we propose a denoising approach for protein complex detection by using variational graph auto-encoder. First, we embed a PIN to vector space by a stacked graph convolutional network (GCN), then decide which interactions in the PIN are credible. If the probability of an interaction being credible is less than a threshold, we delete the interaction. In such a way, we reconstruct a reliable PIN. Following that, we detect protein complexes in the reconstructed PIN by using several typical detection methods, including CPM, Coach, DPClus, GraphEntropy, IPCA and MCODE, and compare the results with those obtained directly from the original PIN. We conduct the empirical evaluation on four yeast PPI datasets (Gavin, Krogan, DIP and Wiphi) and two human PPI datasets (Reactome and Reactomekb), against two yeast complex benchmarks (CYC2008 and MIPS) and three human complex benchmarks (REACT, REACT_uniprotkb and CORE_COMPLEX_human), respectively. Experimental results show that with the reconstructed PINs obtained by our denoising approach, complex detection performance can get obviously boosted, in most cases by over 5%, sometimes even by 200%. Furthermore, we compare our approach with two existing denoising methods (RWS and RedNemo) while varying different matching rates on separate complex distributions. Our results show that in most cases (over 2/3), the proposed approach outperforms the existing methods.
Collapse
Affiliation(s)
- Heng Yao
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, P. R. China.,Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, P. R. China.,Shanghai Electronic Transactions and Information Service, Collaborative Innovation Center, Shanghai, P. R. China
| | - Jihong Guan
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, P. R. China.,Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, P. R. China.,Shanghai Electronic Transactions and Information Service, Collaborative Innovation Center, Shanghai, P. R. China
| | - Tianying Liu
- Department of Computer Science and Technology, Tongji University, 4800 Cao'an Road, Shanghai 201804, P. R. China.,Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Education, Shanghai, P. R. China.,Shanghai Electronic Transactions and Information Service, Collaborative Innovation Center, Shanghai, P. R. China
| |
Collapse
|
6
|
Wang R, Wang C, Liu G. A novel graph clustering method with a greedy heuristic search algorithm for mining protein complexes from dynamic and static PPI networks. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2020.02.063] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
|
7
|
Yao H, Shi Y, Guan J, Zhou S. Accurately Detecting Protein Complexes by Graph Embedding and Combining Functions with Interactions. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:777-787. [PMID: 30736004 DOI: 10.1109/tcbb.2019.2897769] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying protein complexes is helpful for understanding cellular functions and designing drugs. In the last decades, many computational methods have been proposed based on detecting dense subgraphs or subnetworks in Protein-Protein Interaction Networks (PINs). However, the high rate of false positive/negative interactions in PINs prevents from the achievement of satisfactory detection results directly from PINs, because most of such existing methods exploit mainly topological information to do network partitioning. In this paper, we propose a new approach for protein complex detection by merging topological information of PINs and functional information of proteins. We first split proteins to a number of protein groups from the perspective of protein functions by using FunCat data. Then, for each of the resulting protein groups, we calculate two protein-protein similarity matrices: one is computed by using graph embedding over a PIN, the other is by using GO terms, and combine these two matrices to get an integrated similarity matrix. Following that, we cluster the proteins in each group based on the corresponding integrated similarity matrix, and obtain a number of small protein clusters. We map these clusters of proteins onto the PIN, and get a number of connected subgraphs. After a round of merging of overlapping subgraphs, finally we get the detected complexes. We conduct empirical evaluation on four PPI datasets (Collins, Gavin, Krogan, and Wiphi) with two complex benchmarks (CYC2008 and MIPS). Experimental results show that our method performs better than the state-of-the-art methods.
Collapse
|
8
|
Grbić M, Matić D, Kartelj A, Vračević S, Filipović V. A three-phase method for identifying functionally related protein groups in weighted PPI networks. Comput Biol Chem 2020; 86:107246. [PMID: 32339914 DOI: 10.1016/j.compbiolchem.2020.107246] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2019] [Revised: 01/27/2020] [Accepted: 03/03/2020] [Indexed: 01/17/2023]
Abstract
Identifying significant protein groups is of great importance for further understanding protein functions. This paper introduces a novel three-phase heuristic method for identifying such groups in weighted PPI networks. In the first phase a variable neighborhood search (VNS) algorithm is applied on a weighted PPI network, in order to support protein complexes by adding a minimum number of new PPIs. In the second phase proteins from different complexes are merged into larger protein groups. In the third phase these groups are expanded by a number of 2-level neighbor proteins, favoring proteins that have higher average gene co-expression with the base group proteins. Experimental results show that: (i) the proposed VNS algorithm outperforms the existing approach described in literature and (ii) the above-mentioned three-phase method identifies protein groups with very high statistical significance.
Collapse
Affiliation(s)
- Milana Grbić
- University of Banjaluka, Faculty of Natural Sciences and Mathematics, Mladena Stojanovića 2, 78000 Banjaluka, Bosnia and Herzegovina.
| | - Dragan Matić
- University of Banjaluka, Faculty of Natural Sciences and Mathematics, Mladena Stojanovića 2, 78000 Banjaluka, Bosnia and Herzegovina.
| | - Aleksandar Kartelj
- University of Belgrade, Faculty of Mathematics, Studentski trg 16/IV 11 000, Belgrade, Serbia.
| | - Savka Vračević
- University of Banjaluka, Faculty of Natural Sciences and Mathematics, Mladena Stojanovića 2, 78000 Banjaluka, Bosnia and Herzegovina.
| | - Vladimir Filipović
- University of Belgrade, Faculty of Mathematics, Studentski trg 16/IV 11 000, Belgrade, Serbia.
| |
Collapse
|
9
|
Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein–protein interaction networks. Brief Bioinform 2019; 21:1531-1548. [DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 02/04/2023] Open
Abstract
Abstract
Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein–protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
10
|
Sarkar D, Saha S. Machine-learning techniques for the prediction of protein–protein interactions. J Biosci 2019. [DOI: 10.1007/s12038-019-9909-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
|
11
|
Rani RR, Ramyachitra D, Brindhadevi A. Detection of dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach. Sci Rep 2019; 9:11106. [PMID: 31366992 PMCID: PMC6668483 DOI: 10.1038/s41598-019-47468-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 07/11/2019] [Indexed: 11/19/2022] Open
Abstract
The accessibility of a huge amount of protein-protein interaction (PPI) data has allowed to do research on biological networks that reveal the structure of a protein complex, pathways and its cellular organization. A key demand in computational biology is to recognize the modular structure of such biological networks. The detection of protein complexes from the PPI network, is one of the most challenging and significant problems in the post-genomic era. In Bioinformatics, the frequently employed approach for clustering the networks is Markov Clustering (MCL). Many of the researches for protein complex detection were done on the static PPI network, which suffers from a few drawbacks. To resolve this problem, this paper proposes an approach to detect the dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach (DMCL-EHO). Initially, the proposed method divides the PPI network into a set of dynamic subnetworks under various time points by combining the gene expression data and secondly, it employs the clustering analysis on every subnetwork using the MCL along with Elephant Herd Optimization approach. The experimental analysis was employed on different PPI network datasets and the proposed method surpasses various existing approaches in terms of accuracy measures. This paper identifies the common protein complexes that are expressively enriched in gold-standard datasets and also the pathway annotations of the detected protein complexes using the KEGG database.
Collapse
Affiliation(s)
- R Ranjani Rani
- Department of Computer Science, Bharathiar University, Tamilnadu, India
| | - D Ramyachitra
- Department of Computer Science, Bharathiar University, Tamilnadu, India.
| | - A Brindhadevi
- Department of Computer Science, Bharathiar University, Tamilnadu, India
| |
Collapse
|
12
|
Nakajima N, Hayashida M, Jansson J, Maruyama O, Akutsu T. Determining the minimum number of protein-protein interactions required to support known protein complexes. PLoS One 2018; 13:e0195545. [PMID: 29698482 PMCID: PMC5919440 DOI: 10.1371/journal.pone.0195545] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 03/23/2018] [Indexed: 11/18/2022] Open
Abstract
The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.
Collapse
Affiliation(s)
- Natsu Nakajima
- Institute of Molecular and Cellular Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
- * E-mail: (NN); (TA)
| | - Morihiro Hayashida
- Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, 14-4, Nishiikumacho, Matsue, Shimane 690-8518, Japan
| | - Jesper Jansson
- Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
- * E-mail: (NN); (TA)
| |
Collapse
|
13
|
Ruan P, Hayashida M, Akutsu T, Vert JP. Improving prediction of heterodimeric protein complexes using combination with pairwise kernel. BMC Bioinformatics 2018; 19:39. [PMID: 29504897 PMCID: PMC5836830 DOI: 10.1186/s12859-018-2017-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. Results In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. Conclusions We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Collapse
Affiliation(s)
- Peiying Ruan
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Morihiro Hayashida
- Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, 14-4, Nishiikumacho, Matsue, 690-8518, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 6110011, Japan
| | - Jean-Philippe Vert
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, 75006, France. .,Institut Curie, Paris, 75005, France. .,INSERM U900, Paris, 75005, France. .,Ecole Normale Supérieure, Department of Mathematics and Applications, Paris, 75005, France.
| |
Collapse
|
14
|
CPredictor3.0: detecting protein complexes from PPI networks with expression data and functional annotations. BMC SYSTEMS BIOLOGY 2017; 11:135. [PMID: 29322927 PMCID: PMC5763309 DOI: 10.1186/s12918-017-0504-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems. RESULTS In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained. We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall. CONCLUSION CPredictor3.0 can serve as a promising tool of protein complex prediction.
Collapse
|
15
|
Peng X, Wang J, Peng W, Wu FX, Pan Y. Protein-protein interactions: detection, reliability assessment and applications. Brief Bioinform 2017; 18:798-819. [PMID: 27444371 DOI: 10.1093/bib/bbw066] [Citation(s) in RCA: 38] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2016] [Indexed: 01/06/2023] Open
Abstract
Protein-protein interactions (PPIs) participate in all important biological processes in living organisms, such as catalyzing metabolic reactions, DNA replication, DNA transcription, responding to stimuli and transporting molecules from one location to another. To reveal the function mechanisms in cells, it is important to identify PPIs that take place in the living organism. A large number of PPIs have been discovered by high-throughput experiments and computational methods. However, false-positive PPIs have been introduced too. Therefore, to obtain reliable PPIs, many computational methods have been proposed. Generally, these methods can be classified into two categories. One category includes the methods that are designed to determine new reliable PPIs. The other one is designed to assess the reliability of existing PPIs and filter out the unreliable ones. In this article, we review the two kinds of methods for detecting reliable PPIs, and then focus on evaluating the performance of some of these typical methods. Later on, we also enumerate several PPI network-based applications with taking a reliability assessment of the PPI data into consideration. Finally, we will discuss the challenges for obtaining reliable PPIs and future directions of the construction of reliable PPI networks. Our research will provide readers some guidance for choosing appropriate methods and features for obtaining reliable PPIs.
Collapse
|
16
|
Maruyama O, Kuwahara Y. RocSampler: regularizing overlapping protein complexes in protein-protein interaction networks. BMC Bioinformatics 2017; 18:491. [PMID: 29244010 PMCID: PMC5731504 DOI: 10.1186/s12859-017-1920-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background In recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously. Results In this paper, we propose a sampling-based protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushu-u.ac.jp/~om/software/RocSampler/. Conclusions We have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction.
Collapse
Affiliation(s)
- Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, 819-0395, Japan.
| | - Yuki Kuwahara
- Graduate School of Mathematics, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, 819-0395, Japan
| |
Collapse
|
17
|
Identifying protein complexes in PPI network using non-cooperative sequential game. Sci Rep 2017; 7:8410. [PMID: 28827597 PMCID: PMC5566343 DOI: 10.1038/s41598-017-08760-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 07/13/2017] [Indexed: 11/14/2022] Open
Abstract
Identifying protein complexes from protein-protein interaction (PPI) network is an important and challenging task in computational biology as it helps in better understanding of cellular mechanisms in various organisms. In this paper we propose a noncooperative sequential game based model for protein complex detection from PPI network. The key hypothesis is that protein complex formation is driven by mechanism that eventually optimizes the number of interactions within the complex leading to dense subgraph. The hypothesis is drawn from the observed network property named small world. The proposed multi-player game model translates the hypothesis into the game strategies. The Nash equilibrium of the game corresponds to a network partition where each protein either belong to a complex or form a singleton cluster. We further propose an algorithm to find the Nash equilibrium of the sequential game. The exhaustive experiment on synthetic benchmark and real life yeast networks evaluates the structural as well as biological significance of the network partitions.
Collapse
|
18
|
Protein Network Interacting with BK Channels. INTERNATIONAL REVIEW OF NEUROBIOLOGY 2016; 128:127-61. [DOI: 10.1016/bs.irn.2016.03.003] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
|
19
|
Savol AJ, Chennubhotla CS. Approximating frustration scores in complex networks via perturbed Laplacian spectra. PHYSICAL REVIEW. E, STATISTICAL, NONLINEAR, AND SOFT MATTER PHYSICS 2015; 92:062806. [PMID: 26764743 PMCID: PMC4769078 DOI: 10.1103/physreve.92.062806] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Indexed: 06/05/2023]
Abstract
Systems of many interacting components, as found in physics, biology, infrastructure, and the social sciences, are often modeled by simple networks of nodes and edges. The real-world systems frequently confront outside intervention or internal damage whose impact must be predicted or minimized, and such perturbations are then mimicked in the models by altering nodes or edges. This leads to the broad issue of how to best quantify changes in a model network after some type of perturbation. In the case of node removal there are many centrality metrics which associate a scalar quantity with the removed node, but it can be difficult to associate the quantities with some intuitive aspect of physical behavior in the network. This presents a serious hurdle to the application of network theory: real-world utility networks are rarely altered according to theoretic principles unless the kinetic impact on the network's users are fully appreciated beforehand. In pursuit of a kinetically interpretable centrality score, we discuss the f-score, or frustration score. Each f-score quantifies whether a selected node accelerates or inhibits global mean first passage times to a second, independently selected target node. We show that this is a natural way of revealing the dynamical importance of a node in some networks. After discussing merits of the f-score metric, we combine spectral and Laplacian matrix theory in order to quickly approximate the exact f-score values, which can otherwise be expensive to compute. Following tests on both synthetic and real medium-sized networks, we report f-score runtime improvements over exact brute force approaches in the range of 0 to 400% with low error (<3%).
Collapse
Affiliation(s)
- Andrej J Savol
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15260, USA
| | - Chakra S Chennubhotla
- Department of Computational and Systems Biology, University of Pittsburgh School of Medicine, Pittsburgh, Pennsylvania 15260, USA
| |
Collapse
|
20
|
CAMWI: Detecting protein complexes using weighted clustering coefficient and weighted density. Comput Biol Chem 2015; 58:231-40. [DOI: 10.1016/j.compbiolchem.2015.07.012] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2014] [Revised: 06/16/2015] [Accepted: 07/25/2015] [Indexed: 02/02/2023]
|
21
|
The hierarchical organization of natural protein interaction networks confers self-organization properties on pseudocells. BMC SYSTEMS BIOLOGY 2015; 9 Suppl 3:S3. [PMID: 26050708 PMCID: PMC4464023 DOI: 10.1186/1752-0509-9-s3-s3] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Background Cell organization is governed and maintained via specific interactions among its constituent macromolecules. Comparison of the experimentally determined protein interaction networks in different model organisms has revealed little conservation of the specific edges linking ortholog proteins. Nevertheless, some topological characteristics of the graphs representing the networks - namely non-random degree distribution and high clustering coefficient - are shared by networks of distantly related organisms. Here we investigate the role of the topological features of the protein interaction network in promoting cell organization. Methods We have used a stochastic model, dubbed ProtNet representing a computer stylized cell to answer questions about the dynamic consequences of the topological properties of the static graphs representing protein interaction networks. Results By using a novel metrics of cell organization, we show that natural networks, differently from random networks, can promote cell self-organization. Furthermore the ensemble of protein complexes that forms in pseudocells, which self-organize according to the interaction rules of natural networks, are more robust to perturbations. Conclusions The analysis of the dynamic properties of networks with a variety of topological characteristics lead us to conclude that self organization is a consequence of the high clustering coefficient, whereas the scale free degree distribution has little influence on this property.
Collapse
|
22
|
Kobiki S, Maruyama O. ReSAPP: Predicting overlapping protein complexes by merging multiple-sampled partitions of proteins. J Bioinform Comput Biol 2015; 12:1442004. [DOI: 10.1142/s0219720014420049] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Many proteins are known to perform their own functions when they form particular groups of proteins, called protein complexes. With the advent of large-scale protein–protein interaction (PPI) studies, it has been a challenging problem in systems biology to predict protein complexes from PPIs. In this paper, we propose a novel method, called Repeated Simulated Annealing of Partitions of Proteins (ReSAPP), which predicts protein complexes from weighted PPIs. ReSAPP, in the first stage, generates multiple (possibly different) partitions of all proteins of given PPIs by repeatedly applying a simulated annealing based optimization algorithm to the PPIs. In the second stage, all different clusters of size two or more in those multiple partitions are merged into a collection of those clusters, which are outputted as predicted protein complexes. In performance comparison of ReSAPP with our previous algorithm, PPSampler2, as well as other various tools, MCL, MCODE, DPClus, CMC, COACH, RRW, NWE, and PPSampler1, ReSAPP is shown to outperform the other methods. Furthermore, the value of F-measure of ReSAPP is higher than that of the variant of ReSAPP without merging partitions. Thus, we empirically conclude that the combination of sampling multiple partitions and merging them is effective to predict protein complexes.
Collapse
Affiliation(s)
- So Kobiki
- Graduate School of Mathematics, Kyushu University, Motooka, Nishi-ku 819-0395, Fukuoka, Japan
| | - Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, Motooka, Nishi-ku 819-0395, Fukuoka, Japan
| |
Collapse
|
23
|
Keilhauer EC, Hein MY, Mann M. Accurate protein complex retrieval by affinity enrichment mass spectrometry (AE-MS) rather than affinity purification mass spectrometry (AP-MS). Mol Cell Proteomics 2014; 14:120-35. [PMID: 25363814 PMCID: PMC4288248 DOI: 10.1074/mcp.m114.041012] [Citation(s) in RCA: 187] [Impact Index Per Article: 18.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/19/2023] Open
Abstract
Protein–protein interactions are fundamental to the understanding of biological processes. Affinity purification coupled to mass spectrometry (AP-MS) is one of the most promising methods for their investigation. Previously, complexes were purified as much as possible, frequently followed by identification of individual gel bands. However, todays mass spectrometers are highly sensitive, and powerful quantitative proteomics strategies are available to distinguish true interactors from background binders. Here we describe a high performance affinity enrichment-mass spectrometry method for investigating protein–protein interactions, in which no attempt at purifying complexes to homogeneity is made. Instead, we developed analysis methods that take advantage of specific enrichment of interactors in the context of a large amount of unspecific background binders. We perform single-step affinity enrichment of endogenously expressed GFP-tagged proteins and their interactors in budding yeast, followed by single-run, intensity-based label-free quantitative LC-MS/MS analysis. Each pull-down contains around 2000 background binders, which are reinterpreted from troubling contaminants to crucial elements in a novel data analysis strategy. First the background serves for accurate normalization. Second, interacting proteins are not identified by comparison to a single untagged control strain, but instead to the other tagged strains. Third, potential interactors are further validated by their intensity profiles across all samples. We demonstrate the power of our AE-MS method using several well-known and challenging yeast complexes of various abundances. AE-MS is not only highly efficient and robust, but also cost effective, broadly applicable, and can be performed in any laboratory with access to high-resolution mass spectrometers.
Collapse
Affiliation(s)
- Eva C Keilhauer
- From the ‡Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Marco Y Hein
- From the ‡Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Matthias Mann
- From the ‡Department Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried, Germany
| |
Collapse
|
24
|
Prediction of protein-protein interaction strength using domain features with supervised regression. ScientificWorldJournal 2014; 2014:240673. [PMID: 25093200 PMCID: PMC4095743 DOI: 10.1155/2014/240673] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2014] [Accepted: 05/30/2014] [Indexed: 11/18/2022] Open
Abstract
Proteins in living organisms express various important functions by interacting with other proteins and molecules. Therefore, many efforts have been made to investigate and predict protein-protein interactions (PPIs). Analysis of strengths of PPIs is also important because such strengths are involved in functionality of proteins. In this paper, we propose several feature space mappings from protein pairs using protein domain information to predict strengths of PPIs. Moreover, we perform computational experiments employing two machine learning methods, support vector regression (SVR) and relevance vector machine (RVM), for dataset obtained from biological experiments. The prediction results showed that both SVR and RVM with our proposed features outperformed the best existing method.
Collapse
|
25
|
Ruan P, Hayashida M, Maruyama O, Akutsu T. Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels. BMC Bioinformatics 2014; 15 Suppl 2:S6. [PMID: 24564744 PMCID: PMC4016531 DOI: 10.1186/1471-2105-15-s2-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes.
Collapse
|
26
|
Widita CK, Maruyama O. PPSampler2: predicting protein complexes more accurately and efficiently by sampling. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S14. [PMID: 24565288 PMCID: PMC4029527 DOI: 10.1186/1752-0509-7-s6-s14] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The problem of predicting sets of components of heteromeric protein complexes is a challenging problem in Systems Biology. There have been many tools proposed to predict those complexes. Among them, PPSampler, a protein complex prediction algorithm based on the Metropolis-Hastings algorithm, is reported to outperform other tools. In this work, we improve PPSampler by refining scoring functions and a proposal distribution used inside the algorithm so that predicted clusters are more accurate as well as the resulting algorithm runs faster. The new version is called PPSampler2. In computational experiments, PPSampler2 is shown to outperform other tools including PPSampler. The F-measure score of PPSampler2 is 0.67, which is at least 26% higher than those of the other tools. In addition, about 82% of the predicted clusters that are unmatched with any known complexes are statistically significant on the biological process aspect of Gene Ontology. Furthermore, the running time is reduced to twenty minutes, which is 1/24 of that of PPSampler.
Collapse
|
27
|
Maruyama O. Heterodimeric protein complex identification by naïve Bayes classifiers. BMC Bioinformatics 2013; 14:347. [PMID: 24299017 PMCID: PMC4219333 DOI: 10.1186/1471-2105-14-347] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 11/19/2013] [Indexed: 11/25/2022] Open
Abstract
Background Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to predict sets of proteins that form arbitrary types of protein complexes simultaneously have been proposed, it can be found that they often fail to predict heterodimeric complexes. Results In this paper, we have designed several features characterizing heterodimeric protein complexes based on genomic data sets, and proposed a supervised-learning method for the prediction of heterodimeric protein complexes. This method learns the parameters of the features, which are embedded in the naïve Bayes classifier. The log-likelihood ratio derived from the naïve Bayes classifier with the parameter values obtained by maximum likelihood estimation gives the score of a given pair of proteins to predict whether the pair is a heterodimeric complex or not. A five-fold cross-validation shows good performance on yeast. The trained classifiers also show higher predictability than various existing algorithms on yeast data sets with approximate and exact matching criteria. Conclusions Heterodimeric protein complex prediction is a rather harder problem than heteromeric protein complex prediction because heterodimeric protein complex is topologically simpler. However, it turns out that by designing features specialized for heterodimeric protein complexes, predictability of them can be improved. Thus, the design of more sophisticate features for heterodimeric protein complexes as well as the accumulation of more accurate and useful genome-wide data sets will lead to higher predictability of heterodimeric protein complexes. Our tool can be downloaded from http://imi.kyushu-u.ac.jp/~om/.
Collapse
Affiliation(s)
- Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan.
| |
Collapse
|
28
|
Ruan P, Hayashida M, Maruyama O, Akutsu T. Prediction of heterodimeric protein complexes from weighted protein-protein interaction networks using novel features and kernel functions. PLoS One 2013; 8:e65265. [PMID: 23776458 PMCID: PMC3679142 DOI: 10.1371/journal.pone.0065265] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 04/23/2013] [Indexed: 12/30/2022] Open
Abstract
Since many proteins express their functional activity by interacting with other proteins and forming protein complexes, it is very useful to identify sets of proteins that form complexes. For that purpose, many prediction methods for protein complexes from protein-protein interactions have been developed such as MCL, MCODE, RNSC, PCP, RRW, and NWE. These methods have dealt with only complexes with size of more than three because the methods often are based on some density of subgraphs. However, heterodimeric protein complexes that consist of two distinct proteins occupy a large part according to several comprehensive databases of known complexes. In this paper, we propose several feature space mappings from protein-protein interaction data, in which each interaction is weighted based on reliability. Furthermore, we make use of prior knowledge on protein domains to develop feature space mappings, domain composition kernel and its combination kernel with our proposed features. We perform ten-fold cross-validation computational experiments. These results suggest that our proposed kernel considerably outperforms the naive Bayes-based method, which is the best existing method for predicting heterodimeric protein complexes.
Collapse
Affiliation(s)
- Peiying Ruan
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan
| | - Morihiro Hayashida
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan
| | - Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, Japan
| |
Collapse
|
29
|
Shih YK, Parthasarathy S. Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics 2013; 28:i473-i479. [PMID: 22962469 PMCID: PMC3436797 DOI: 10.1093/bioinformatics/bts370] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation: In recent years, Markov clustering (MCL) has emerged as an effective algorithm for clustering biological networks—for instance clustering protein–protein interaction (PPI) networks to identify functional modules. However, a limitation of MCL and its variants (e.g. regularized MCL) is that it only supports hard clustering often leading to an impedance mismatch given that there is often a significant overlap of proteins across functional modules. Results: In this article, we seek to redress this limitation. We propose a soft variation of Regularized MCL (R-MCL) based on the idea of iteratively (re-)executing R-MCL while ensuring that multiple executions do not always converge to the same clustering result thus allowing for highly overlapped clusters. The resulting algorithm, denoted soft regularized Markov clustering, is shown to outperform a range of extant state-of-the-art approaches in terms of accuracy of identifying functional modules on three real PPI networks. Availability: All data and codes are freely available upon request. Contact:srini@cse.ohio-state.edu Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Keng Shih
- Department of Computer Science and Engineering, the Ohio State University, Columbus, OH 43210-1277, USA
| | | |
Collapse
|
30
|
Tatsuke D, Maruyama O. Sampling strategy for protein complex prediction using cluster size frequency. Gene 2013; 518:152-8. [DOI: 10.1016/j.gene.2012.11.050] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 11/27/2012] [Indexed: 11/28/2022]
|
31
|
Royer L, Reimann M, Stewart AF, Schroeder M. Network compression as a quality measure for protein interaction networks. PLoS One 2012; 7:e35729. [PMID: 22719828 PMCID: PMC3377704 DOI: 10.1371/journal.pone.0035729] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2011] [Accepted: 03/24/2012] [Indexed: 11/18/2022] Open
Abstract
With the advent of large-scale protein interaction studies, there is much debate about data quality. Can different noise levels in the measurements be assessed by analyzing network structure? Because proteomic regulation is inherently co-operative, modular and redundant, it is inherently compressible when represented as a network. Here we propose that network compression can be used to compare false positive and false negative noise levels in protein interaction networks. We validate this hypothesis by first confirming the detrimental effect of false positives and false negatives. Second, we show that gold standard networks are more compressible. Third, we show that compressibility correlates with co-expression, co-localization, and shared function. Fourth, we also observe correlation with better protein tagging methods, physiological expression in contrast to over-expression of tagged proteins, and smart pooling approaches for yeast two-hybrid screens. Overall, this new measure is a proxy for both sensitivity and specificity and gives complementary information to standard measures such as average degree and clustering coefficients.
Collapse
Affiliation(s)
- Loic Royer
- Bioinformatics, Biotec TU Dresden, Dresden, Germany
| | | | | | | |
Collapse
|
32
|
James K, Wipat A, Hallinan J. Is newer better?--evaluating the effects of data curation on integrated analyses in Saccharomyces cerevisiae. Integr Biol (Camb) 2012; 4:715-27. [PMID: 22526920 DOI: 10.1039/c2ib00123c] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Recent high-throughput experiments have produced a wealth of heterogeneous datasets, each of which provides information about different aspects of the cell. Consequently, integration of diverse data types is essential in order to address many biological questions. The quality of any integrated analysis system is dependent upon the quality of its component data, and upon the Gold Standard data used to evaluate it. It is commonly assumed that the quality of data improves as databases grow and change, particularly for manually curated databases. However, the validity of this assumption can be questioned, given the constant changes in the data coupled with the high level of noise associated with high-throughput experimental techniques. One of the most powerful approaches to data integration is the use of Probabilistic Functional Integrated Networks (PFINs). Here, we systematically analyse the changes in four highly-curated and widely-used online databases and evaluate the extent to which these changes affect the protein function prediction performance of PFINs in the yeast Saccharomyces cerevisiae. We find that the global trend in network performance improves over time. Where individual areas of biology are concerned, however, the most recent files do not always produce the best results. Individual datasets have unique biases towards different biological processes and by selecting and integrating relevant datasets performance can be improved. When using any type of integrated system to answer a specific biological question careful selection of raw data and Gold Standard is vital, since the most recent data may not be the most appropriate.
Collapse
Affiliation(s)
- Katherine James
- School of Computing Science, Newcastle University, Newcastle upon Tyne, NE1 7RU, United Kingdom
| | | | | |
Collapse
|
33
|
Maruyama O, Chihara A. NWE: Node-weighted expansion for protein complex prediction using random walk distances. Proteome Sci 2011; 9 Suppl 1:S14. [PMID: 22165822 PMCID: PMC3289075 DOI: 10.1186/1477-5956-9-s1-s14] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Background Protein complexes are important entities to organize various biological processes in the cell, like signal transduction, gene expression, and molecular transmission. In most cases, proteins perform their intrinsic tasks in association with their specific interacting partners, forming protein complexes. Therefore, an enriched catalog of protein complexes in a cell could accelerate further research to elucidate the mechanisms underlying many biological processes. However, known complexes are still limited. Thus, it is a challenging problem to computationally predict protein complexes from protein-protein interaction networks, and other genome-wide data sets. Methods Macropol et al. proposed a protein complex prediction algorithm, called RRW, which repeatedly expands a current cluster of proteins according to the stationary vector of a random walk with restarts with the cluster whose proteins are equally weighted. In the cluster expansion, all the proteins within the cluster have equal influences on determination of newly added protein to the cluster. In this paper, we extend the RRW algorithm by introducing a random walk with restarts with a cluster of proteins, each of which is weighted by the sum of the strengths of supporting evidence for the direct physical interactions involving the protein. The resulting algorithm is called NWE (Node-Weighted Expansion of clusters of proteins). Those interaction data are obtained from the WI-PHI database. Results We have validated the biological significance of the results using curated complexes in the CYC2008 database, and compared our method to RRW and MCL (Markov Clustering), a popular clustering-based method, and found that our algorithm outperforms the other algorithms. Conclusions It turned out that it is an effective approach in protein complex prediction to expand a cluster of proteins, each of which is weighted by the sum of the strengths of supporting evidence for the direct physical interactions involving the protein.
Collapse
Affiliation(s)
- Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka Nishi-ku Fukuoka 819-0395, Japan.
| | | |
Collapse
|
34
|
Ferraro N, Palopoli L, Panni S, Rombo SE. Asymmetric comparison and querying of biological networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:876-889. [PMID: 21321368 DOI: 10.1109/tcbb.2011.29] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/30/2023]
Abstract
Comparing and querying the protein-protein interaction (PPI) networks of different organisms is important to infer knowledge about conservation across species. Known methods that perform these tasks operate symmetrically, i.e., they do not assign a distinct role to the input PPI networks. However, in most cases, the input networks are indeed distinguishable on the basis of how the corresponding organism is biologically well characterized. In this paper a new idea is developed, that is, to exploit differences in the characterization of organisms at hand in order to devise methods for comparing their PPI networks. We use the PPI network (called Master) of the best characterized organism as a fingerprint to guide the alignment process to the second input network (called Slave), so that generated results preferably retain the structural characteristics of the Master network. Technically, this is obtained by generating from the Master a finite automaton, called alignment model, which is then fed with (a linearization of) the Slave for the purpose of extracting, via the Viterbi algorithm, matching subgraphs. We propose an approach able to perform global alignment and network querying, and we apply it on PPI networks. We tested our method showing that the results it returns are biologically relevant.
Collapse
Affiliation(s)
- Nicola Ferraro
- Department of Electronics, Computer Science and Systems, University of Calabria, Via Pietro Bucci, Arcavacata di Rende (CS) 87036, Italy.
| | | | | | | |
Collapse
|
35
|
Heuck S, Gerstmann UC, Michalke B, Kanter U. Genome-wide analysis of caesium and strontium accumulation in Saccharomyces cerevisiae. Yeast 2011; 27:817-35. [PMID: 20641020 DOI: 10.1002/yea.1780] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022] Open
Abstract
(137)Cs and (90)Sr contribute to significant and long-lasting contamination of the environment with radionuclides. Due to their relatively high biological availability, they are transferred rapidly into biotic systems and may enter the food chain. In this study, we analysed 4862 haploid yeast knockout strains of Saccharomyces cerevisiae to identify genes involved in caesium (Cs(+)) and/or strontium (Sr(2+)) accumulation. According to this analysis, 212 mutant strains were associated with reproducible altered Cs(+) and/or Sr(2+) accumulation. These mutants were deficient for a wide range of cellular processes. Among those, the vacuolar function and biogenesis turned out to be crucial for both Cs(+) and Sr(2+) accumulation. Disruption of the vacuole diminished Cs(+) accumulation, whereas Sr(2+) enrichment was enhanced. Further analysis with a subset of the identified candidates were undertaken comparing the accumulation of Cs(+) and Sr(2+) with their essential counterparts potassium (K(+)) and calcium (Ca(2+)). Sr(2+) and Ca(2+) accumulation was highly correlated in yeast excluding the possibility of a differential regulation or uptake mechanisms. In direct contrast, the respective results suggest that Cs(+) uptake is at least partially dependent on mechanisms distinct from K(+) uptake. Single candidates (e.g. KHA1) are presented which might be specifically responsible for Cs(+) homeostasis.
Collapse
Affiliation(s)
- Sabine Heuck
- Helmholtz Zentrum München, Institut für Strahlenschutz, Neuherberg, Germany
| | | | | | | |
Collapse
|
36
|
Abstract
We provide an overview on the state of the art for the Omics technologies, the types of omics data and the bioinformatics resources relevant and related to Omics. We also illustrate the bioinformatics challenges of dealing with high-throughput data. This overview touches several fundamental aspects of Omics and bioinformatics: data standardisation, data sharing, storing Omics data appropriately and exploring Omics data in bioinformatics. Though the principles and concepts presented are true for the various different technological fields, we concentrate in three main Omics fields namely: genomics, transcriptomics and proteomics. Finally we address the integration of Omics data, and provide several useful links for bioinformatics and Omics.
Collapse
Affiliation(s)
- Maria V Schneider
- EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK.
| | | |
Collapse
|
37
|
Patil A, Nakai K, Nakamura H. HitPredict: a database of quality assessed protein-protein interactions in nine species. Nucleic Acids Res 2010; 39:D744-9. [PMID: 20947562 PMCID: PMC3013773 DOI: 10.1093/nar/gkq897] [Citation(s) in RCA: 84] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Despite the availability of a large number of protein–protein interactions (PPIs) in several species, researchers are often limited to using very small subsets in a few organisms due to the high prevalence of spurious interactions. In spite of the importance of quality assessment of experimentally determined PPIs, a surprisingly small number of databases provide interactions with scores and confidence levels. We introduce HitPredict (http://hintdb.hgc.jp/htp/), a database with quality assessed PPIs in nine species. HitPredict assigns a confidence level to interactions based on a reliability score that is computed using evidence from sequence, structure and functional annotations of the interacting proteins. HitPredict was first released in 2005 and is updated annually. The current release contains 36 930 proteins with 176 983 non-redundant, physical interactions, of which 116 198 (66%) are predicted to be of high confidence.
Collapse
Affiliation(s)
- Ashwini Patil
- Human Genome Center, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokane-dai, Minato-ku, Tokyo 108-8639, Japan.
| | | | | |
Collapse
|
38
|
D'Alessandro A, Zolla L, Scaloni A. The bovine milk proteome: cherishing, nourishing and fostering molecular complexity. An interactomics and functional overview. MOLECULAR BIOSYSTEMS 2010; 7:579-97. [PMID: 20877905 DOI: 10.1039/c0mb00027b] [Citation(s) in RCA: 67] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
Bovine milk represents an essential source of nutrients for lactating calves and a key raw material for human food preparations. A wealth of data are present in the literature dealing with massive proteomic analyses of milk fractions and independent targeted studies on specific groups of proteins, such as caseins, globulins, hormones and cytokines. In this study, we merged data from previous investigations to compile an exhaustive list of 573 non-redundant annotated protein entries. This inventory was exploited for integrated in silico studies, including functional GO term enrichment (FatiGO/Babelomics), multiple pathway and network analyses. As expected, most of the milk proteins were grouped under pathways/networks/ontologies referring to nutrient transport, lipid metabolism and objectification of the immune system response. Notably enough, another functional family was observed as the most statistically significant one, which included proteins involved in the induction of cellular proliferation processes as well as in anatomical and haematological system development. Although the latter function for bovine milk proteins has long been postulated, studies reported so far mainly focused on a handful of molecules and missed the whole overview resulting from an integrated holistic analysis. A preliminary map of the bovine milk proteins interactome was also built up, which will be refined in future as result of the widespread use of quantitative methods in protein interaction studies and consequent reduction of false-positives within associated databases.
Collapse
Affiliation(s)
- Angelo D'Alessandro
- Department of Environmental Sciences, University of Tuscia, Largo dell'Università, SNC, 01100 Viterbo, Italy
| | | | | |
Collapse
|
39
|
Terentiev AA, Moldogazieva NT, Shaitan KV. Dynamic proteomics in modeling of the living cell. Protein-protein interactions. BIOCHEMISTRY (MOSCOW) 2010; 74:1586-607. [DOI: 10.1134/s0006297909130112] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
40
|
|
41
|
Macropol K, Can T, Singh AK. RRW: repeated random walks on genome-scale protein networks for local cluster discovery. BMC Bioinformatics 2009; 10:283. [PMID: 19740439 PMCID: PMC2748087 DOI: 10.1186/1471-2105-10-283] [Citation(s) in RCA: 96] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 09/09/2009] [Indexed: 03/24/2023] Open
Abstract
BACKGROUND We propose an efficient and biologically sensitive algorithm based on repeated random walks (RRW) for discovering functional modules, e.g., complexes and pathways, within large-scale protein networks. Compared to existing cluster identification techniques, RRW implicitly makes use of network topology, edge weights, and long range interactions between proteins. RESULTS We apply the proposed technique on a functional network of yeast genes and accurately identify statistically significant clusters of proteins. We validate the biological significance of the results using known complexes in the MIPS complex catalogue database and well-characterized biological processes. We find that 90% of the created clusters have the majority of their catalogued proteins belonging to the same MIPS complex, and about 80% have the majority of their proteins involved in the same biological process. We compare our method to various other clustering techniques, such as the Markov Clustering Algorithm (MCL), and find a significant improvement in the RRW clusters' precision and accuracy values. CONCLUSION RRW, which is a technique that exploits the topology of the network, is more precise and robust in finding local clusters. In addition, it has the added flexibility of being able to find multi-functional proteins by allowing overlapping clusters.
Collapse
Affiliation(s)
- Kathy Macropol
- Department of Computer Science, University of California, Santa Barbara, CA 93106, USA.
| | | | | |
Collapse
|
42
|
Pushing structural information into the yeast interactome by high-throughput protein docking experiments. PLoS Comput Biol 2009; 5:e1000490. [PMID: 19714207 PMCID: PMC2722787 DOI: 10.1371/journal.pcbi.1000490] [Citation(s) in RCA: 60] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/23/2009] [Accepted: 07/28/2009] [Indexed: 11/19/2022] Open
Abstract
The last several years have seen the consolidation of high-throughput proteomics initiatives to identify and characterize protein interactions and macromolecular complexes in model organisms. In particular, more that 10,000 high-confidence protein-protein interactions have been described between the roughly 6,000 proteins encoded in the budding yeast genome (Saccharomyces cerevisiae). However, unfortunately, high-resolution three-dimensional structures are only available for less than one hundred of these interacting pairs. Here, we expand this structural information on yeast protein interactions by running the first-ever high-throughput docking experiment with some of the best state-of-the-art methodologies, according to our benchmarks. To increase the coverage of the interaction space, we also explore the possibility of using homology models of varying quality in the docking experiments, instead of experimental structures, and assess how it would affect the global performance of the methods. In total, we have applied the docking procedure to 217 experimental structures and 1,023 homology models, providing putative structural models for over 3,000 protein-protein interactions in the yeast interactome. Finally, we analyze in detail the structural models obtained for the interaction between SAM1-anthranilate synthase complex and the MET30-RNA polymerase III to illustrate how our predictions can be straightforwardly used by the scientific community. The results of our experiment will be integrated into the general 3D-Repertoire pipeline, a European initiative to solve the structures of as many as possible protein complexes in yeast at the best possible resolution. All docking results are available at http://gatealoy.pcb.ub.es/HT_docking/. Proteins are the main perpetrators of most biological processes. However, they seldom act alone, and most cellular functions are, in fact, carried out by large macromolecular complexes and regulated through intricate protein-protein interaction networks. Consequently, large efforts have been devoted to unveil protein interrelationships in a high-throughput manner, and the last several years have seen the consecution of the first interactome drafts for several model organisms. Unfortunately, these studies only reveal whether two proteins interact, but not the molecular bases of these interactions. A full comprehension of how proteins bind and form complexes can only come from high-resolution, three-dimensional (3D) structures, since they provide the key quasi-atomic details necessary to understand how the individual components in a complex or pathway are assembled and coordinated to function as a molecular unit. Here, we use protein docking experiments, in a high-throughput manner, to predict the 3D structure of over 3,000 interactions in yeast, which will be used to complement the complex structures obtained within the 3D-Repertoire pan-European initiative (http://www.3drepertoire.org).
Collapse
|
43
|
Yeast prion [PSI+] lowers the levels of mitochondrial prohibitins. BIOCHIMICA ET BIOPHYSICA ACTA-MOLECULAR CELL RESEARCH 2009; 1793:1703-9. [PMID: 19695293 DOI: 10.1016/j.bbamcr.2009.08.003] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/08/2009] [Revised: 08/07/2009] [Accepted: 08/07/2009] [Indexed: 12/11/2022]
Abstract
We report proteomic analyses that establish the effect of cytoplasmic prion [PSI(+)] on the protein complement of yeast mitochondria. A set of 44 yeast mitochondrial proteins whose levels were affected by [PSI(+)] was identified by two methods of gel-free and label-free differential proteomics. From this set we focused on prohibitins, Phb1 and Phb2, and the mitochondrially synthesized Cox2 subunit of cytochrome oxidase. By immunoblotting we confirmed the decreased level of Cox2 and reduced mitochondrial localization of the prohibitins in [PSI(+)] cells, which both became partially restored by [PSI(+)] curing. The presence of the [PSI(+)] prion also caused premature fragmentation of mitochondria, a phenomenon linked to prohibitin depletion in mammalian cells. By fractionation of cellular extracts we demonstrated a [PSI(+)]-dependent increase of the proportion of prohibitins in the high molecular weight fraction of aggregated proteins. We propose that the presence of the yeast prion causes newly synthesized prohibitins to aggregate in the cytosol, and therefore reduces their levels in mitochondria, which in turn reduces the stability of Cox2 and possibly of other proteins, not investigated here in detail.
Collapse
|
44
|
Jo WJ, Loguinov A, Wintz H, Chang M, Smith AH, Kalman D, Zhang L, Smith MT, Vulpe CD. Comparative functional genomic analysis identifies distinct and overlapping sets of genes required for resistance to monomethylarsonous acid (MMAIII) and arsenite (AsIII) in yeast. Toxicol Sci 2009; 111:424-36. [PMID: 19635755 DOI: 10.1093/toxsci/kfp162] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Arsenic is a human toxin and carcinogen commonly found as a contaminant in drinking water. Arsenite (As(III)) is the most toxic inorganic form, but recent evidence indicates that the metabolite monomethylarsonous acid (MMA(III)) is even more toxic. We have used a chemical genomics approach to identify the genes that modulate the cellular toxicity of MMA(III) and As(III) in the yeast Saccharomyces cerevisiae. Functional profiling using homozygous deletion mutants provided evidence of the requirement of highly conserved biological processes in the response against both arsenicals including tubulin folding, DNA double-strand break repair, and chromatin modification. At the equitoxic doses of 150 microM MMA(III) and 300 microM As(III), genes related to glutathione metabolism were essential only for resistance to the former, suggesting a higher potency of MMA(III) to disrupt glutathione metabolism than As(III). Treatments with MMA(III) induced a significant increase in glutathione levels in the wild-type strain, which correlated to the requirement of genes from the sulfur and methionine metabolic pathways and was consistent with the induction of oxidative stress. Based on the relative sensitivity of deletion strains deficient in GSH metabolism and tubulin folding processes, oxidative stress appeared to be the primary mechanism of MMA(III) toxicity whereas secondary to tubulin disruption in the case of As(III). Many of the identified yeast genes have orthologs in humans that could potentially modulate arsenic toxicity in a similar manner as their yeast counterparts.
Collapse
Affiliation(s)
- William J Jo
- Department of Nutritional Sciences and Toxicology, University of California Berkeley, Berkeley, California 94720, USA
| | | | | | | | | | | | | | | | | |
Collapse
|
45
|
Burston HE, Maldonado-Báez L, Davey M, Montpetit B, Schluter C, Wendland B, Conibear E. Regulators of yeast endocytosis identified by systematic quantitative analysis. J Cell Biol 2009; 185:1097-110. [PMID: 19506040 PMCID: PMC2711619 DOI: 10.1083/jcb.200811116] [Citation(s) in RCA: 91] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2008] [Accepted: 05/12/2009] [Indexed: 11/22/2022] Open
Abstract
Endocytosis of receptors at the plasma membrane is controlled by a complex mechanism that includes clathrin, adaptors, and actin regulators. Many of these proteins are conserved in yeast yet lack observable mutant phenotypes, which suggests that yeast endocytosis may be subject to different regulatory mechanisms. Here, we have systematically defined genes required for internalization using a quantitative genome-wide screen that monitors localization of the yeast vesicle-associated membrane protein (VAMP)/synaptobrevin homologue Snc1. Genetic interaction mapping was used to place these genes into functional modules containing known and novel endocytic regulators, and cargo selectivity was evaluated by an array-based comparative analysis. We demonstrate that clathrin and the yeast AP180 clathrin adaptor proteins have a cargo-specific role in Snc1 internalization. We additionally identify low dye binding 17 (LDB17) as a novel conserved component of the endocytic machinery. Ldb17 is recruited to cortical actin patches before actin polymerization and regulates normal coat dynamics and actin assembly. Our findings highlight the conserved machinery and reveal novel mechanisms that underlie endocytic internalization.
Collapse
Affiliation(s)
- Helen E. Burston
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, and Department of Medical Genetics, University of British Columbia, Vancouver V5Z 4H4, British Columbia, Canada
| | | | - Michael Davey
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, and Department of Medical Genetics, University of British Columbia, Vancouver V5Z 4H4, British Columbia, Canada
| | - Benjamen Montpetit
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, and Department of Medical Genetics, University of British Columbia, Vancouver V5Z 4H4, British Columbia, Canada
| | - Cayetana Schluter
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, and Department of Medical Genetics, University of British Columbia, Vancouver V5Z 4H4, British Columbia, Canada
| | - Beverly Wendland
- Department of Biology, The Johns Hopkins University, Baltimore, MD 21218
| | - Elizabeth Conibear
- Centre for Molecular Medicine and Therapeutics, Child and Family Research Institute, and Department of Medical Genetics, University of British Columbia, Vancouver V5Z 4H4, British Columbia, Canada
| |
Collapse
|
46
|
Jo WJ, Kim JH, Oh E, Jaramillo D, Holman P, Loguinov AV, Arkin AP, Nislow C, Giaever G, Vulpe CD. Novel insights into iron metabolism by integrating deletome and transcriptome analysis in an iron deficiency model of the yeast Saccharomyces cerevisiae. BMC Genomics 2009; 10:130. [PMID: 19321002 PMCID: PMC2669097 DOI: 10.1186/1471-2164-10-130] [Citation(s) in RCA: 33] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2008] [Accepted: 03/25/2009] [Indexed: 12/01/2022] Open
Abstract
Background Iron-deficiency anemia is the most prevalent form of anemia world-wide. The yeast Saccharomyces cerevisiae has been used as a model of cellular iron deficiency, in part because many of its cellular pathways are conserved. To better understand how cells respond to changes in iron availability, we profiled the yeast genome with a parallel analysis of homozygous deletion mutants to identify essential components and cellular processes required for optimal growth under iron-limited conditions. To complement this analysis, we compared those genes identified as important for fitness to those that were differentially-expressed in the same conditions. The resulting analysis provides a global perspective on the cellular processes involved in iron metabolism. Results Using functional profiling, we identified several genes known to be involved in high affinity iron uptake, in addition to novel genes that may play a role in iron metabolism. Our results provide support for the primary involvement in iron homeostasis of vacuolar and endosomal compartments, as well as vesicular transport to and from these compartments. We also observed an unexpected importance of the peroxisome for growth in iron-limited media. Although these components were essential for growth in low-iron conditions, most of them were not differentially-expressed. Genes with altered expression in iron deficiency were mainly associated with iron uptake and transport mechanisms, with little overlap with those that were functionally required. To better understand this relationship, we used expression-profiling of selected mutants that exhibited slow growth in iron-deficient conditions, and as a result, obtained additional insight into the roles of CTI6, DAP1, MRS4 and YHR045W in iron metabolism. Conclusion Comparison between functional and gene expression data in iron deficiency highlighted the complementary utility of these two approaches to identify important functional components. This should be taken into consideration when designing and analyzing data from these type of studies. We used this and other published data to develop a molecular interaction network of iron metabolism in yeast.
Collapse
Affiliation(s)
- William J Jo
- Department of Nutritional Sciences and Toxicology, University of California, Berkeley, California 94720, USA.
| | | | | | | | | | | | | | | | | | | |
Collapse
|
47
|
Oberdorf R, Kortemme T. Complex topology rather than complex membership is a determinant of protein dosage sensitivity. Mol Syst Biol 2009; 5:253. [PMID: 19293832 PMCID: PMC2671925 DOI: 10.1038/msb.2009.9] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2008] [Accepted: 02/06/2009] [Indexed: 11/23/2022] Open
Abstract
The ‘balance hypothesis' predicts that non-stoichiometric variations in concentrations of proteins participating in complexes should be deleterious. As a corollary, heterozygous deletions and overexpression of protein complex members should have measurable fitness effects. However, genome-wide studies of heterozygous deletions in Saccharomyces cerevisiae and overexpression have been unable to unambiguously relate complex membership to dosage sensitivity. We test the hypothesis that it is not complex membership alone but rather the topology of interactions within a complex that is a predictor of dosage sensitivity. We develop a model that uses the law of mass action to consider how complex formation might be affected by varying protein concentrations given a protein's topological positioning within the complex. Although we find little evidence for combinatorial inhibition of complex formation playing a major role in overexpression phenotypes, consistent with previous results, we show significant correlations between predicted sensitivity of complex formation to protein concentrations and both heterozygous deletion fitness and protein abundance noise levels. Our model suggests a mechanism for dosage sensitivity and provides testable predictions for the effect of alterations in protein abundance noise.
Collapse
Affiliation(s)
- Richard Oberdorf
- Graduate Group in Biophysics, University of California, San Francisco, CA 94158-2330, USA
| | | |
Collapse
|
48
|
Cusick ME, Yu H, Smolyar A, Venkatesan K, Carvunis AR, Simonis N, Rual JF, Borick H, Braun P, Dreze M, Vandenhaute J, Galli M, Yazaki J, Hill DE, Ecker JR, Roth FP, Vidal M. Literature-curated protein interaction datasets. Nat Methods 2009; 6:39-46. [PMID: 19116613 PMCID: PMC2683745 DOI: 10.1038/nmeth.1284] [Citation(s) in RCA: 234] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
High quality datasets are needed to understand how global and local properties of protein-protein interaction, or “interactome”, networks relate to biological mechanisms, and to guide research on individual proteins. Evaluations of existing curation of protein interaction experiments reported in the literature find that curation can be error prone and possibly of lower quality than commonly assumed.
Collapse
Affiliation(s)
- Michael E Cusick
- Center for Cancer Systems Biology and Department of Cancer Biology, Dana-Farber Cancer Institute, 44 Binney Street, Boston, Massachusetts 02115, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Björklund AK, Light S, Hedin L, Elofsson A. Quantitative assessment of the structural bias in protein-protein interaction assays. Proteomics 2009; 8:4657-67. [PMID: 18924110 DOI: 10.1002/pmic.200800150] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
With recent publications of several large-scale protein-protein interaction (PPI) studies, the realization of the full yeast interaction network is getting closer. Here, we have analysed several yeast protein interaction datasets to understand their strengths and weaknesses. In particular, we investigate the effect of experimental biases on some of the protein properties suggested to be enriched in highly connected proteins. Finally, we use support vector machines (SVM) to assess the contribution of these properties to protein interactivity. We find that protein abundance is the most important factor for detecting interactions in tandem affinity purifications (TAP), while it is of less importance for Yeast Two Hybrid (Y2H) screens. Consequently, sequence conservation and/or essentiality of hubs may be related to their high abundance. Further, proteins with disordered structure are over-represented in Y2H screens and in one, but not the other, large-scale TAP assay. Hence, disordered regions may be important both in transient interactions and interactions in complexes. Finally, a few domain families seem to be responsible for a large part of all interactions. Most importantly, we show that there are method-specific biases in PPI experiments. Thus, care should be taken before drawing strong conclusions based on a single dataset.
Collapse
Affiliation(s)
- Asa K Björklund
- Department of Biochemistry and Biophysics, Center for Biological Membrane Research/Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden
| | | | | | | |
Collapse
|
50
|
Gibson TA, Goldberg DS. Questioning the ubiquity of neofunctionalization. PLoS Comput Biol 2009; 5:e1000252. [PMID: 19119408 PMCID: PMC2597716 DOI: 10.1371/journal.pcbi.1000252] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2008] [Accepted: 11/12/2008] [Indexed: 11/19/2022] Open
Abstract
Gene duplication provides much of the raw material from which functional diversity evolves. Two evolutionary mechanisms have been proposed that generate functional diversity: neofunctionalization, the de novo acquisition of function by one duplicate, and subfunctionalization, the partitioning of ancestral functions between gene duplicates. With protein interactions as a surrogate for protein functions, evidence of prodigious neofunctionalization and subfunctionalization has been identified in analyses of empirical protein interactions and evolutionary models of protein interactions. However, we have identified three phenomena that have contributed to neofunctionalization being erroneously identified as a significant factor in protein interaction network evolution. First, self-interacting proteins are underreported in interaction data due to biological artifacts and design limitations in the two most common high-throughput protein interaction assays. Second, evolutionary inferences have been drawn from paralog analysis without consideration for concurrent and subsequent duplication events. Third, the theoretical model of prodigious neofunctionalization is unable to reproduce empirical network clustering and relies on untenable parameter requirements. In light of these findings, we believe that protein interaction evolution is more persuasively characterized by subfunctionalization and self-interactions.
Collapse
Affiliation(s)
- Todd A Gibson
- Computational Bioscience Program, University of Colorado Denver, Aurora, Colorado, United States of America.
| | | |
Collapse
|