1
|
Liang L, Chen V, Zhu K, Fan X, Lu X, Lu S. Integrating data and knowledge to identify functional modules of genes: a multilayer approach. BMC Bioinformatics 2019; 20:225. [PMID: 31046665 PMCID: PMC6498600 DOI: 10.1186/s12859-019-2800-y] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2018] [Accepted: 04/09/2019] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Characterizing the modular structure of cellular network is an important way to identify novel genes for targeted therapeutics. This is made possible by the rising of high-throughput technology. Unfortunately, computational methods to identify functional modules were limited by the data quality issues of high-throughput techniques. This study aims to integrate knowledge extracted from literature to further improve the accuracy of functional module identification. RESULTS Our new model and algorithm were applied to both yeast and human interactomes. Predicted functional modules have covered over 90% of the proteins in both organisms, while maintaining a comparable overall accuracy. We found that the combination of both mRNA expression information and biomedical knowledge greatly improved the performance of functional module identification, which is better than those only using protein interaction network weighted with transcriptomic data, literature knowledge, or simply unweighted protein interaction network. Our new algorithm also achieved better performance when comparing with some other well-known methods, especially in terms of the positive predictive value (PPV), which indicated the confidence of novel discovery. CONCLUSION Higher PPV with the multiplex approach suggested that information from both sources has been effectively integrated to reduce false positive. With protein coverage higher than 90%, our algorithm is able to generate more novel biological hypothesis with higher confidence.
Collapse
Affiliation(s)
- Lifan Liang
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Vicky Chen
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research, Inc, Frederick, USA
| | - Kunju Zhu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Clinical Medicine Research Institute, Jinan University, Guangzhou, 51063, Guangdong, China
| | - Xiaonan Fan
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
- Key Laboratory of Information Fusion Technology of Ministry of Education, School of Automation, Northwestern Polytechnical University, Xi'an, 710072, Shanxi, China
| | - Xinghua Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Songjian Lu
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
| |
Collapse
|
2
|
Nakajima N, Hayashida M, Jansson J, Maruyama O, Akutsu T. Determining the minimum number of protein-protein interactions required to support known protein complexes. PLoS One 2018; 13:e0195545. [PMID: 29698482 PMCID: PMC5919440 DOI: 10.1371/journal.pone.0195545] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2017] [Accepted: 03/23/2018] [Indexed: 11/18/2022] Open
Abstract
The prediction of protein complexes from protein-protein interactions (PPIs) is a well-studied problem in bioinformatics. However, the currently available PPI data is not enough to describe all known protein complexes. In this paper, we express the problem of determining the minimum number of (additional) required protein-protein interactions as a graph theoretic problem under the constraint that each complex constitutes a connected component in a PPI network. For this problem, we develop two computational methods: one is based on integer linear programming (ILPMinPPI) and the other one is based on an existing greedy-type approximation algorithm (GreedyMinPPI) originally developed in the context of communication and social networks. Since the former method is only applicable to datasets of small size, we apply the latter method to a combination of the CYC2008 protein complex dataset and each of eight PPI datasets (STRING, MINT, BioGRID, IntAct, DIP, BIND, WI-PHI, iRefIndex). The results show that the minimum number of additional required PPIs ranges from 51 (STRING) to 964 (BIND), and that even the four best PPI databases, STRING (51), BioGRID (67), WI-PHI (93) and iRefIndex (85), do not include enough PPIs to form all CYC2008 protein complexes. We also demonstrate that the proposed problem framework and our solutions can enhance the prediction accuracy of existing PPI prediction methods. ILPMinPPI can be freely downloaded from http://sunflower.kuicr.kyoto-u.ac.jp/~nakajima/.
Collapse
Affiliation(s)
- Natsu Nakajima
- Institute of Molecular and Cellular Biosciences, The University of Tokyo, 1-1-1, Yayoi, Bunkyo-ku, Tokyo 113-0032, Japan
- * E-mail: (NN); (TA)
| | - Morihiro Hayashida
- Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, 14-4, Nishiikumacho, Matsue, Shimane 690-8518, Japan
| | - Jesper Jansson
- Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
| | - Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka 819-0395, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto 611-0011, Japan
- * E-mail: (NN); (TA)
| |
Collapse
|
3
|
Zhang Z, Song J, Tang J, Xu X, Guo F. Detecting complexes from edge-weighted PPI networks via genes expression analysis. BMC SYSTEMS BIOLOGY 2018; 12:40. [PMID: 29745859 PMCID: PMC5998908 DOI: 10.1186/s12918-018-0565-y] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
Abstract
BACKGROUND Identifying complexes from PPI networks has become a key problem to elucidate protein functions and identify signal and biological processes in a cell. Proteins binding as complexes are important roles of life activity. Accurate determination of complexes in PPI networks is crucial for understanding principles of cellular organization. RESULTS We propose a novel method to identify complexes on PPI networks, based on different co-expression information. First, we use Markov Cluster Algorithm with an edge-weighting scheme to calculate complexes on PPI networks. Then, we propose some significant features, such as graph information and gene expression analysis, to filter and modify complexes predicted by Markov Cluster Algorithm. To evaluate our method, we test on two experimental yeast PPI networks. CONCLUSIONS On DIP network, our method has Precision and F-Measure values of 0.6004 and 0.5528. On MIPS network, our method has F-Measure and S n values of 0.3774 and 0.3453. Comparing to existing methods, our method improves Precision value by at least 0.1752, F-Measure value by at least 0.0448, S n value by at least 0.0771. Experiments show that our method achieves better results than some state-of-the-art methods for identifying complexes on PPI networks, with the prediction quality improved in terms of evaluation criteria.
Collapse
Affiliation(s)
- Zehua Zhang
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
| | - Jian Song
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
- School of Chemical Engineering and Technology, Tianjin University, Tianjin, People’s Republic of China
| | - Jijun Tang
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
- Department of Computer Science and Engineering, University of South Carolina, Columbia, USA
| | - Xinying Xu
- School of Information Engineering, Taiyuan University of Technology, Taiyuan, People’s Republic of China
| | - Fei Guo
- School of Computer Science and Technology, Tianjin University, Tianjin, People’s Republic of China
- Tianjin University Institute of Computational Biology, Tianjin, People’s Republic of China
| |
Collapse
|
4
|
Ruan P, Hayashida M, Akutsu T, Vert JP. Improving prediction of heterodimeric protein complexes using combination with pairwise kernel. BMC Bioinformatics 2018; 19:39. [PMID: 29504897 PMCID: PMC5836830 DOI: 10.1186/s12859-018-2017-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Since many proteins become functional only after they interact with their partner proteins and form protein complexes, it is essential to identify the sets of proteins that form complexes. Therefore, several computational methods have been proposed to predict complexes from the topology and structure of experimental protein-protein interaction (PPI) network. These methods work well to predict complexes involving at least three proteins, but generally fail at identifying complexes involving only two different proteins, called heterodimeric complexes or heterodimers. There is however an urgent need for efficient methods to predict heterodimers, since the majority of known protein complexes are precisely heterodimers. Results In this paper, we use three promising kernel functions, Min kernel and two pairwise kernels, which are Metric Learning Pairwise Kernel (MLPK) and Tensor Product Pairwise Kernel (TPPK). We also consider the normalization forms of Min kernel. Then, we combine Min kernel or its normalization form and one of the pairwise kernels by plugging. We applied kernels based on PPI, domain, phylogenetic profile, and subcellular localization properties to predicting heterodimers. Then, we evaluate our method by employing C-Support Vector Classification (C-SVC), carrying out 10-fold cross-validation, and calculating the average F-measures. The results suggest that the combination of normalized-Min-kernel and MLPK leads to the best F-measure and improved the performance of our previous work, which had been the best existing method so far. Conclusions We propose new methods to predict heterodimers, using a machine learning-based approach. We train a support vector machine (SVM) to discriminate interacting vs non-interacting protein pairs, based on informations extracted from PPI, domain, phylogenetic profiles and subcellular localization. We evaluate in detail new kernel functions to encode these data, and report prediction performance that outperforms the state-of-the-art.
Collapse
Affiliation(s)
- Peiying Ruan
- Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST), Tokyo, Japan
| | - Morihiro Hayashida
- Department of Electrical Engineering and Computer Science, National Institute of Technology, Matsue College, 14-4, Nishiikumacho, Matsue, 690-8518, Japan
| | - Tatsuya Akutsu
- Bioinformatics Center, Institute for Chemical Research, Kyoto University, Gokasho, Uji, Kyoto, 6110011, Japan
| | - Jean-Philippe Vert
- MINES ParisTech, PSL Research University, CBIO-Centre for Computational Biology, Paris, 75006, France. .,Institut Curie, Paris, 75005, France. .,INSERM U900, Paris, 75005, France. .,Ecole Normale Supérieure, Department of Mathematics and Applications, Paris, 75005, France.
| |
Collapse
|
5
|
Maruyama O, Kuwahara Y. RocSampler: regularizing overlapping protein complexes in protein-protein interaction networks. BMC Bioinformatics 2017; 18:491. [PMID: 29244010 PMCID: PMC5731504 DOI: 10.1186/s12859-017-1920-5] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/11/2022] Open
Abstract
Background In recent years, protein-protein interaction (PPI) networks have been well recognized as important resources to elucidate various biological processes and cellular mechanisms. In this paper, we address the problem of predicting protein complexes from a PPI network. This problem has two difficulties. One is related to small complexes, which contains two or three components. It is relatively difficult to identify them due to their simpler internal structure, but unfortunately complexes of such sizes are dominant in major protein complex databases, such as CYC2008. Another difficulty is how to model overlaps between predicted complexes, that is, how to evaluate different predicted complexes sharing common proteins because CYC2008 and other databases include such protein complexes. Thus, it is critical how to model overlaps between predicted complexes to identify them simultaneously. Results In this paper, we propose a sampling-based protein complex prediction method, RocSampler (Regularizing Overlapping Complexes), which exploits, as part of the whole scoring function, a regularization term for the overlaps of predicted complexes and that for the distribution of sizes of predicted complexes. We have implemented RocSampler in MATLAB and its executable file for Windows is available at the site, http://imi.kyushu-u.ac.jp/~om/software/RocSampler/. Conclusions We have applied RocSampler to five yeast PPI networks and shown that it is superior to other existing methods. This implies that the design of scoring functions including regularization terms is an effective approach for protein complex prediction.
Collapse
Affiliation(s)
- Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, 819-0395, Japan.
| | - Yuki Kuwahara
- Graduate School of Mathematics, Kyushu University, 744 Motooka, Nishi-ku, Fukuoka, 819-0395, Japan
| |
Collapse
|
6
|
Peng X, Wang J, Huan J, Wu FX. Double-layer clustering method to predict protein complexes based on power-law distribution and protein sublocalization. J Theor Biol 2016; 395:186-193. [DOI: 10.1016/j.jtbi.2016.01.043] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 01/08/2016] [Accepted: 01/24/2016] [Indexed: 10/22/2022]
|
7
|
Kobiki S, Maruyama O. ReSAPP: Predicting overlapping protein complexes by merging multiple-sampled partitions of proteins. J Bioinform Comput Biol 2015; 12:1442004. [DOI: 10.1142/s0219720014420049] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Many proteins are known to perform their own functions when they form particular groups of proteins, called protein complexes. With the advent of large-scale protein–protein interaction (PPI) studies, it has been a challenging problem in systems biology to predict protein complexes from PPIs. In this paper, we propose a novel method, called Repeated Simulated Annealing of Partitions of Proteins (ReSAPP), which predicts protein complexes from weighted PPIs. ReSAPP, in the first stage, generates multiple (possibly different) partitions of all proteins of given PPIs by repeatedly applying a simulated annealing based optimization algorithm to the PPIs. In the second stage, all different clusters of size two or more in those multiple partitions are merged into a collection of those clusters, which are outputted as predicted protein complexes. In performance comparison of ReSAPP with our previous algorithm, PPSampler2, as well as other various tools, MCL, MCODE, DPClus, CMC, COACH, RRW, NWE, and PPSampler1, ReSAPP is shown to outperform the other methods. Furthermore, the value of F-measure of ReSAPP is higher than that of the variant of ReSAPP without merging partitions. Thus, we empirically conclude that the combination of sampling multiple partitions and merging them is effective to predict protein complexes.
Collapse
Affiliation(s)
- So Kobiki
- Graduate School of Mathematics, Kyushu University, Motooka, Nishi-ku 819-0395, Fukuoka, Japan
| | - Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, Motooka, Nishi-ku 819-0395, Fukuoka, Japan
| |
Collapse
|
8
|
Ruan P, Hayashida M, Maruyama O, Akutsu T. Prediction of heterotrimeric protein complexes by two-phase learning using neighboring kernels. BMC Bioinformatics 2014; 15 Suppl 2:S6. [PMID: 24564744 PMCID: PMC4016531 DOI: 10.1186/1471-2105-15-s2-s6] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Protein complexes play important roles in biological systems such as gene regulatory networks and metabolic pathways. Most methods for predicting protein complexes try to find protein complexes with size more than three. It, however, is known that protein complexes with smaller sizes occupy a large part of whole complexes for several species. In our previous work, we developed a method with several feature space mappings and the domain composition kernel for prediction of heterodimeric protein complexes, which outperforms existing methods. Results We propose methods for prediction of heterotrimeric protein complexes by extending techniques in the previous work on the basis of the idea that most heterotrimeric protein complexes are not likely to share the same protein with each other. We make use of the discriminant function in support vector machines (SVMs), and design novel feature space mappings for the second phase. As the second classifier, we examine SVMs and relevance vector machines (RVMs). We perform 10-fold cross-validation computational experiments. The results suggest that our proposed two-phase methods and SVM with the extended features outperform the existing method NWE, which was reported to outperform other existing methods such as MCL, MCODE, DPClus, CMC, COACH, RRW, and PPSampler for prediction of heterotrimeric protein complexes. Conclusions We propose two-phase prediction methods with the extended features, the domain composition kernel, SVMs and RVMs. The two-phase method with the extended features and the domain composition kernel using SVM as the second classifier is particularly useful for prediction of heterotrimeric protein complexes.
Collapse
|
9
|
Widita CK, Maruyama O. PPSampler2: predicting protein complexes more accurately and efficiently by sampling. BMC SYSTEMS BIOLOGY 2013; 7 Suppl 6:S14. [PMID: 24565288 PMCID: PMC4029527 DOI: 10.1186/1752-0509-7-s6-s14] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
The problem of predicting sets of components of heteromeric protein complexes is a challenging problem in Systems Biology. There have been many tools proposed to predict those complexes. Among them, PPSampler, a protein complex prediction algorithm based on the Metropolis-Hastings algorithm, is reported to outperform other tools. In this work, we improve PPSampler by refining scoring functions and a proposal distribution used inside the algorithm so that predicted clusters are more accurate as well as the resulting algorithm runs faster. The new version is called PPSampler2. In computational experiments, PPSampler2 is shown to outperform other tools including PPSampler. The F-measure score of PPSampler2 is 0.67, which is at least 26% higher than those of the other tools. In addition, about 82% of the predicted clusters that are unmatched with any known complexes are statistically significant on the biological process aspect of Gene Ontology. Furthermore, the running time is reduced to twenty minutes, which is 1/24 of that of PPSampler.
Collapse
|
10
|
Maruyama O. Heterodimeric protein complex identification by naïve Bayes classifiers. BMC Bioinformatics 2013; 14:347. [PMID: 24299017 PMCID: PMC4219333 DOI: 10.1186/1471-2105-14-347] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2013] [Accepted: 11/19/2013] [Indexed: 11/25/2022] Open
Abstract
Background Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to predict sets of proteins that form arbitrary types of protein complexes simultaneously have been proposed, it can be found that they often fail to predict heterodimeric complexes. Results In this paper, we have designed several features characterizing heterodimeric protein complexes based on genomic data sets, and proposed a supervised-learning method for the prediction of heterodimeric protein complexes. This method learns the parameters of the features, which are embedded in the naïve Bayes classifier. The log-likelihood ratio derived from the naïve Bayes classifier with the parameter values obtained by maximum likelihood estimation gives the score of a given pair of proteins to predict whether the pair is a heterodimeric complex or not. A five-fold cross-validation shows good performance on yeast. The trained classifiers also show higher predictability than various existing algorithms on yeast data sets with approximate and exact matching criteria. Conclusions Heterodimeric protein complex prediction is a rather harder problem than heteromeric protein complex prediction because heterodimeric protein complex is topologically simpler. However, it turns out that by designing features specialized for heterodimeric protein complexes, predictability of them can be improved. Thus, the design of more sophisticate features for heterodimeric protein complexes as well as the accumulation of more accurate and useful genome-wide data sets will lead to higher predictability of heterodimeric protein complexes. Our tool can be downloaded from http://imi.kyushu-u.ac.jp/~om/.
Collapse
Affiliation(s)
- Osamu Maruyama
- Institute of Mathematics for Industry, Kyushu University, Fukuoka, Japan.
| |
Collapse
|
11
|
Inferring the effective TOR-dependent network: a computational study in yeast. BMC SYSTEMS BIOLOGY 2013; 7:84. [PMID: 24005029 PMCID: PMC4016608 DOI: 10.1186/1752-0509-7-84] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/23/2013] [Accepted: 08/28/2013] [Indexed: 11/25/2022]
Abstract
Background Calorie restriction (CR) is one of the most conserved non-genetic interventions that extends healthspan in evolutionarily distant species, ranging from yeast to mammals. The target of rapamycin (TOR) has been shown to play a key role in mediating healthspan extension in response to CR by integrating different signals that monitor nutrient-availability and orchestrating various components of cellular machinery in response. Both genetic and pharmacological interventions that inhibit the TOR pathway exhibit a similar phenotype, which is not further amplified by CR. Results In this paper, we present the first comprehensive, computationally derived map of TOR downstream effectors, with the objective of discovering key lifespan mediators, their crosstalk, and high-level organization. We adopt a systematic approach for tracing information flow from the TOR complex and use it to identify relevant signaling elements. By constructing a high-level functional map of TOR downstream effectors, we show that our approach is not only capable of recapturing previously known pathways, but also suggests potential targets for future studies. Information flow scores provide an aggregate ranking of relevance of proteins with respect to the TOR signaling pathway. These rankings must be normalized for degree bias, appropriately interpreted, and mapped to associated roles in pathways. We propose a novel statistical framework for integrating information flow scores, the set of differentially expressed genes in response to rapamycin treatment, and the transcriptional regulatory network. We use this framework to identify the most relevant transcription factors in mediating the observed transcriptional response, and to construct the effective response network of the TOR pathway. This network is hypothesized to mediate life-span extension in response to TOR inhibition. Conclusions Our approach, unlike experimental methods, is not limited to specific aspects of cellular response. Rather, it predicts transcriptional changes and post-translational modifications in response to TOR inhibition. The constructed effective response network greatly enhances understanding of the mechanisms underlying the aging process and helps in identifying new targets for further investigation of anti-aging regimes. It also allows us to identify potential network biomarkers for diagnosis and prognosis of age-related pathologies.
Collapse
|
12
|
Shih YK, Parthasarathy S. Identifying functional modules in interaction networks through overlapping Markov clustering. Bioinformatics 2013; 28:i473-i479. [PMID: 22962469 PMCID: PMC3436797 DOI: 10.1093/bioinformatics/bts370] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Motivation: In recent years, Markov clustering (MCL) has emerged as an effective algorithm for clustering biological networks—for instance clustering protein–protein interaction (PPI) networks to identify functional modules. However, a limitation of MCL and its variants (e.g. regularized MCL) is that it only supports hard clustering often leading to an impedance mismatch given that there is often a significant overlap of proteins across functional modules. Results: In this article, we seek to redress this limitation. We propose a soft variation of Regularized MCL (R-MCL) based on the idea of iteratively (re-)executing R-MCL while ensuring that multiple executions do not always converge to the same clustering result thus allowing for highly overlapped clusters. The resulting algorithm, denoted soft regularized Markov clustering, is shown to outperform a range of extant state-of-the-art approaches in terms of accuracy of identifying functional modules on three real PPI networks. Availability: All data and codes are freely available upon request. Contact:srini@cse.ohio-state.edu Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yu-Keng Shih
- Department of Computer Science and Engineering, the Ohio State University, Columbus, OH 43210-1277, USA
| | | |
Collapse
|
13
|
Tatsuke D, Maruyama O. Sampling strategy for protein complex prediction using cluster size frequency. Gene 2013; 518:152-8. [DOI: 10.1016/j.gene.2012.11.050] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2012] [Accepted: 11/27/2012] [Indexed: 11/28/2022]
|