1
|
Dhifli W, Karabadji NEI, Elati M. Evolutionary mining of skyline clusters of attributed graph data. Inf Sci (N Y) 2020. [DOI: 10.1016/j.ins.2018.09.053] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
|
2
|
Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein–protein interaction networks. Brief Bioinform 2019; 21:1531-1548. [DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 02/04/2023] Open
Abstract
Abstract
Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein–protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
3
|
Context-dependent prediction of protein complexes by SiComPre. NPJ Syst Biol Appl 2018; 4:37. [PMID: 30245847 PMCID: PMC6141528 DOI: 10.1038/s41540-018-0073-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/06/2018] [Revised: 08/21/2018] [Accepted: 08/29/2018] [Indexed: 11/09/2022] Open
Abstract
Most cellular processes are regulated by groups of proteins interacting together to form protein complexes. Protein compositions vary between different tissues or disease conditions enabling or preventing certain protein-protein interactions and resulting in variations in the complexome. Quantitative and qualitative characterization of context-specific protein complexes will help to better understand context-dependent variations in the physiological behavior of cells. Here, we present SiComPre 1.0, a computational tool that predicts context-specific protein complexes by integrating multi-omics sources. SiComPre outperforms other protein complex prediction tools in qualitative predictions and is unique in giving quantitative predictions on the complexome depending on the specific interactions and protein abundances defined by the user. We provide tutorials and examples on the complexome prediction of common model organisms, various human tissues and how the complexome is affected by drug treatment.
Collapse
|
4
|
Zhang Z, Song J, Tang J, Xu X, Guo F. Detecting complexes from edge-weighted PPI networks via genes expression analysis. BMC SYSTEMS BIOLOGY 2018; 12:40. [PMID: 29745859 PMCID: PMC5998908 DOI: 10.1186/s12918-018-0565-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
BACKGROUND Identifying complexes from PPI networks has become a key problem to elucidate protein functions and identify signal and biological processes in a cell. Proteins binding as complexes are important roles of life activity. Accurate determination of complexes in PPI networks is crucial for understanding principles of cellular organization. RESULTS We propose a novel method to identify complexes on PPI networks, based on different co-expression information. First, we use Markov Cluster Algorithm with an edge-weighting scheme to calculate complexes on PPI networks. Then, we propose some significant features, such as graph information and gene expression analysis, to filter and modify complexes predicted by Markov Cluster Algorithm. To evaluate our method, we test on two experimental yeast PPI networks. CONCLUSIONS On DIP network, our method has Precision and F-Measure values of 0.6004 and 0.5528. On MIPS network, our method has F-Measure and S n values of 0.3774 and 0.3453. Comparing to existing methods, our method improves Precision value by at least 0.1752, F-Measure value by at least 0.0448, S n value by at least 0.0771. Experiments show that our method achieves better results than some state-of-the-art methods for identifying complexes on PPI networks, with the prediction quality improved in terms of evaluation criteria.
Collapse
Affiliation(s)
- Zehua Zhang
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
| | - Jian Song
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, People’s Republic of China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Xinying Xu
- School of Information Engineering, Taiyuan University of Technology, Taiyuan, People’s Republic of China
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
| |
Collapse
|
5
|
Identifying protein complexes in PPI network using non-cooperative sequential game. Sci Rep 2017; 7:8410. [PMID: 28827597 PMCID: PMC5566343 DOI: 10.1038/s41598-017-08760-x] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2017] [Accepted: 07/13/2017] [Indexed: 11/14/2022] Open
Abstract
Identifying protein complexes from protein-protein interaction (PPI) network is an important and challenging task in computational biology as it helps in better understanding of cellular mechanisms in various organisms. In this paper we propose a noncooperative sequential game based model for protein complex detection from PPI network. The key hypothesis is that protein complex formation is driven by mechanism that eventually optimizes the number of interactions within the complex leading to dense subgraph. The hypothesis is drawn from the observed network property named small world. The proposed multi-player game model translates the hypothesis into the game strategies. The Nash equilibrium of the game corresponds to a network partition where each protein either belong to a complex or form a singleton cluster. We further propose an algorithm to find the Nash equilibrium of the sequential game. The exhaustive experiment on synthetic benchmark and real life yeast networks evaluates the structural as well as biological significance of the network partitions.
Collapse
|
6
|
Peng X, Wang J, Huan J, Wu FX. Double-layer clustering method to predict protein complexes based on power-law distribution and protein sublocalization. J Theor Biol 2016; 395:186-193. [DOI: 10.1016/j.jtbi.2016.01.043] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 01/08/2016] [Accepted: 01/24/2016] [Indexed: 10/22/2022]
|
7
|
Srihari S, Yong CH, Patil A, Wong L. Methods for protein complex prediction and their contributions towards understanding the organisation, function and dynamics of complexes. FEBS Lett 2015; 589:2590-602. [PMID: 25913176 DOI: 10.1016/j.febslet.2015.04.026] [Citation(s) in RCA: 53] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2015] [Revised: 04/14/2015] [Accepted: 04/14/2015] [Indexed: 12/30/2022]
Abstract
Complexes of physically interacting proteins constitute fundamental functional units responsible for driving biological processes within cells. A faithful reconstruction of the entire set of complexes is therefore essential to understand the functional organisation of cells. In this review, we discuss the key contributions of computational methods developed till date (approximately between 2003 and 2015) for identifying complexes from the network of interacting proteins (PPI network). We evaluate in depth the performance of these methods on PPI datasets from yeast, and highlight their limitations and challenges, in particular at detecting sparse and small or sub-complexes and discerning overlapping complexes. We describe methods for integrating diverse information including expression profiles and 3D structures of proteins with PPI networks to understand the dynamics of complex formation, for instance, of time-based assembly of complex subunits and formation of fuzzy complexes from intrinsically disordered proteins. Finally, we discuss methods for identifying dysfunctional complexes in human diseases, an application that is proving invaluable to understand disease mechanisms and to discover novel therapeutic targets. We hope this review aptly commemorates a decade of research on computational prediction of complexes and constitutes a valuable reference for further advancements in this exciting area.
Collapse
Affiliation(s)
- Sriganesh Srihari
- Institute for Molecular Bioscience, The University of Queensland, St. Lucia, Queensland 4067, Australia.
| | - Chern Han Yong
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| | - Ashwini Patil
- Human Genome Centre, The Institute of Medical Science, The University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo 108-8639, Japan
| | - Limsoon Wong
- Department of Computer Science, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
8
|
Yong CH, Wong L. From the static interactome to dynamic protein complexes: Three challenges. J Bioinform Comput Biol 2015; 13:1571001. [DOI: 10.1142/s0219720015710018] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023]
Abstract
Protein interactions and complexes behave in a dynamic fashion, but this dynamism is not captured by interaction screening technologies, and not preserved in protein–protein interaction (PPI) networks. The analysis of static interaction data to derive dynamic protein complexes leads to several challenges, of which we identify three. First, many proteins participate in multiple complexes, leading to overlapping complexes embedded within highly-connected regions of the PPI network. This makes it difficult to accurately delimit the boundaries of such complexes. Second, many condition- and location-specific PPIs are not detected, leading to sparsely-connected complexes that cannot be picked out by clustering algorithms. Third, the majority of complexes are small complexes (made up of two or three proteins), which are extra sensitive to the effects of extraneous edges and missing co-complex edges. We show that many existing complex-discovery algorithms have trouble predicting such complexes, and show that our insight into the disparity between the static interactome and dynamic protein complexes can be used to improve the performance of complex discovery.
Collapse
Affiliation(s)
- Chern Han Yong
- Graduate School for Integrative Sciences and Engineering, National University of Singapore, 28 Medical Drive, Singapore 117456, Singapore
| | - Limsoon Wong
- School of Computing, National University of Singapore, 13 Computing Drive, Singapore 117417, Singapore
| |
Collapse
|
9
|
Kobiki S, Maruyama O. ReSAPP: Predicting overlapping protein complexes by merging multiple-sampled partitions of proteins. J Bioinform Comput Biol 2015; 12:1442004. [DOI: 10.1142/s0219720014420049] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Many proteins are known to perform their own functions when they form particular groups of proteins, called protein complexes. With the advent of large-scale protein–protein interaction (PPI) studies, it has been a challenging problem in systems biology to predict protein complexes from PPIs. In this paper, we propose a novel method, called Repeated Simulated Annealing of Partitions of Proteins (ReSAPP), which predicts protein complexes from weighted PPIs. ReSAPP, in the first stage, generates multiple (possibly different) partitions of all proteins of given PPIs by repeatedly applying a simulated annealing based optimization algorithm to the PPIs. In the second stage, all different clusters of size two or more in those multiple partitions are merged into a collection of those clusters, which are outputted as predicted protein complexes. In performance comparison of ReSAPP with our previous algorithm, PPSampler2, as well as other various tools, MCL, MCODE, DPClus, CMC, COACH, RRW, NWE, and PPSampler1, ReSAPP is shown to outperform the other methods. Furthermore, the value of F-measure of ReSAPP is higher than that of the variant of ReSAPP without merging partitions. Thus, we empirically conclude that the combination of sampling multiple partitions and merging them is effective to predict protein complexes.
Collapse
Affiliation(s)
- So Kobiki
- Graduate School of Mathematics, Kyushu University, Motooka, Nishi-ku 819-0395, Fukuoka, Japan
| | - Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, Motooka, Nishi-ku 819-0395, Fukuoka, Japan
| |
Collapse
|
10
|
Yong CH, Maruyama O, Wong L. Discovery of small protein complexes from PPI networks with size-specific supervised weighting. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 5:S3. [PMID: 25559663 PMCID: PMC4305982 DOI: 10.1186/1752-0509-8-s5-s3] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
The prediction of small complexes (consisting of two or three distinct proteins) is an important and challenging subtask in protein complex prediction from protein-protein interaction (PPI) networks. The prediction of small complexes is especially susceptible to noise (missing or spurious interactions) in the PPI network, while smaller groups of proteins are likelier to take on topological characteristics of real complexes by chance. We propose a two-stage approach, SSS and Extract, for discovering small complexes. First, the PPI network is weighted by size-specific supervised weighting (SSS), which integrates heterogeneous data and their topological features with an overall topological isolatedness feature. SSS uses a naive-Bayes maximum-likelihood model to weight the edges with two posterior probabilities: that of being in a small complex, and of being in a large complex. The second stage, Extract, analyzes the SSS-weighted network to extract putative small complexes and scores them by cohesiveness-weighted density, which incorporates both small-co-complex and large-co-complex weights of edges within and surrounding the complexes. We test our approach on the prediction of yeast and human small complexes, and demonstrate that our approach attains higher precision and recall than some popular complex prediction algorithms. Furthermore, our approach generates a greater number of novel predictions with higher quality in terms of functional coherence.
Collapse
|
11
|
Ruan P, Hayashida M, Maruyama O, Akutsu T. Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels. BMC Bioinformatics 2014; 15 Suppl 2:S6. [PMID: 24564744 PMCID: PMC4016531 DOI: 10.1186/1471-2105-15-s2-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes.
Collapse
|
12
|
Widita CK, Maruyama O. PPSampler2: predicting protein complexes more accurately and efficiently by sampling. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S14. [PMID: 24565288 PMCID: PMC4029527 DOI: 10.1186/1752-0509-7-s6-s14] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The problem of predicting sets of components of heteromeric protein complexes is a challenging problem in Systems Biology. There have been many tools proposed to predict those complexes. Among them, PPSampler, a protein complex prediction algorithm based on the Metropolis-Hastings algorithm, is reported to outperform other tools. In this work, we improve PPSampler by refining scoring functions and a proposal distribution used inside the algorithm so that predicted clusters are more accurate as well as the resulting algorithm runs faster. The new version is called PPSampler2. In computational experiments, PPSampler2 is shown to outperform other tools including PPSampler. The F-measure score of PPSampler2 is 0.67, which is at least 26% higher than those of the other tools. In addition, about 82% of the predicted clusters that are unmatched with any known complexes are statistically significant on the biological process aspect of Gene Ontology. Furthermore, the running time is reduced to twenty minutes, which is 1/24 of that of PPSampler.
Collapse
|
13
|
Maruyama O. Heterodimeric protein complex identification by naïve Bayes classifiers. BMC Bioinformatics 2013; 14:347. [PMID: 24299017 PMCID: PMC4219333 DOI: 10.1186/1471-2105-14-347] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 11/19/2013] [Indexed: 11/25/2022] Open
Abstract
Background Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to predict sets of proteins that form arbitrary types of protein complexes simultaneously have been proposed, it can be found that they often fail to predict heterodimeric complexes. Results In this paper, we have designed several features characterizing heterodimeric protein complexes based on genomic data sets, and proposed a supervised-learning method for the prediction of heterodimeric protein complexes. This method learns the parameters of the features, which are embedded in the naïve Bayes classifier. The log-likelihood ratio derived from the naïve Bayes classifier with the parameter values obtained by maximum likelihood estimation gives the score of a given pair of proteins to predict whether the pair is a heterodimeric complex or not. A five-fold cross-validation shows good performance on yeast. The trained classifiers also show higher predictability than various existing algorithms on yeast data sets with approximate and exact matching criteria. Conclusions Heterodimeric protein complex prediction is a rather harder problem than heteromeric protein complex prediction because heterodimeric protein complex is topologically simpler. However, it turns out that by designing features specialized for heterodimeric protein complexes, predictability of them can be improved. Thus, the design of more sophisticate features for heterodimeric protein complexes as well as the accumulation of more accurate and useful genome-wide data sets will lead to higher predictability of heterodimeric protein complexes. Our tool can be downloaded from http://imi.kyushu-u.ac.jp/~om/.
Collapse
Affiliation(s)
- Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan.
| |
Collapse
|