Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Wu M, Li X, Chua HN, Kwoh CK, Ng SK. Integrating diverse biological and computational sources for reliable protein-protein interactions. BMC Bioinformatics 2010;11 Suppl 7:S8. [PMID: 21106130 PMCID: PMC2957691 DOI: 10.1186/1471-2105-11-s7-s8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

For:	Wu M, Li X, Chua HN, Kwoh CK, Ng SK. Integrating diverse biological and computational sources for reliable protein-protein interactions. BMC Bioinformatics 2010;11 Suppl 7:S8. [PMID: 21106130 PMCID: PMC2957691 DOI: 10.1186/1471-2105-11-s7-s8] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Number

Cited by Other Article(s)

Jalilvand A, Akbari B, Zare Mirakabad F. S-FLN: A sequence-based hierarchical approach for functional linkage network construction. J Theor Biol 2018;437:149-162. [PMID: 29080781 DOI: 10.1016/j.jtbi.2017.10.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2016] [Revised: 07/27/2017] [Accepted: 10/18/2017] [Indexed: 11/24/2022]

Integration of multiple biological features yields high confidence human protein interactome. J Theor Biol 2016;403:85-96. [PMID: 27196966 DOI: 10.1016/j.jtbi.2016.05.020] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2016] [Revised: 05/01/2016] [Accepted: 05/11/2016] [Indexed: 01/05/2023]

Abstract

The biological function of a protein is usually determined by its physical interaction with other proteins. Protein-protein interactions (PPIs) are identified through various experimental methods and are stored in curated databases. The noisiness of the existing PPI data is evident, and it is essential that a more reliable data is generated. Furthermore, the selection of a set of PPIs at different confidence levels might be necessary for many studies. Although different methodologies were introduced to evaluate the confidence scores for binary interactions, a highly reliable, almost complete PPI network of Homo sapiens is not proposed yet. The quality and coverage of human protein interactome need to be improved to be used in various disciplines, especially in biomedicine. In the present work, we propose an unsupervised statistical approach to assign confidence scores to PPIs of H. sapiens. To achieve this goal PPI data from six different databases were collected and a total of 295,288 non-redundant interactions between 15,950 proteins were acquired. The present scoring system included the context information that was assigned to PPIs derived from eight biological attributes. A high confidence network, which included 147,923 binary interactions between 13,213 proteins, had scores greater than the cutoff value of 0.80, for which sensitivity, specificity, and coverage were 94.5%, 80.9%, and 82.8%, respectively. We compared the present scoring method with others for evaluation. Reducing the noise inherent in experimental PPIs via our scoring scheme increased the accuracy significantly. As it was demonstrated through the assessment of process and cancer subnetworks, this study allows researchers to construct and analyze context-specific networks via valid PPI sets and one can easily achieve subnetworks around proteins of interest at a specified confidence level.

Collapse

Ou-Yang L, Wu M, Zhang XF, Dai DQ, Li XL, Yan H. A two-layer integration framework for protein complex detection. BMC Bioinformatics 2016;17:100. [PMID: 26911324 PMCID: PMC4765032 DOI: 10.1186/s12859-016-0939-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 01/27/2016] [Indexed: 01/05/2023] Open

Abstract

Background

Protein complexes carry out nearly all signaling and functional processes within cells. The study of protein complexes is an effective strategy to analyze cellular functions and biological processes. With the increasing availability of proteomics data, various computational methods have recently been developed to predict protein complexes. However, different computational methods are based on their own assumptions and designed to work on different data sources, and various biological screening methods have their unique experiment conditions, and are often different in scale and noise level. Therefore, a single computational method on a specific data source is generally not able to generate comprehensive and reliable prediction results.

Results

In this paper, we develop a novel Two-layer INtegrative Complex Detection (TINCD) model to detect protein complexes, leveraging the information from both clustering results and raw data sources. In particular, we first integrate various clustering results to construct consensus matrices for proteins to measure their overall co-complex propensity. Second, we combine these consensus matrices with the co-complex score matrix derived from Tandem Affinity Purification/Mass Spectrometry (TAP) data and obtain an integrated co-complex similarity network via an unsupervised metric fusion method. Finally, a novel graph regularized doubly stochastic matrix decomposition model is proposed to detect overlapping protein complexes from the integrated similarity network.

Conclusions

Extensive experimental results demonstrate that TINCD performs much better than 21 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-016-0939-3) contains supplementary material, which is available to authorized users.

Collapse

Wu M, Li X, Zhang F, Li X, Kwoh CK, Zheng J. In silico prediction of synthetic lethality by meta-analysis of genetic interactions, functions, and pathways in yeast and human cancer. Cancer Inform 2014;13:71-80. [PMID: 25452682 PMCID: PMC4224103 DOI: 10.4137/cin.s14026] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2014] [Revised: 08/15/2014] [Accepted: 08/18/2014] [Indexed: 02/07/2023] Open

Integration strategy is a key step in network-based analysis and dramatically affects network topological properties and inferring outcomes. BIOMED RESEARCH INTERNATIONAL 2014;2014:296349. [PMID: 25243127 PMCID: PMC4163410 DOI: 10.1155/2014/296349] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/22/2014] [Revised: 07/14/2014] [Accepted: 07/17/2014] [Indexed: 01/17/2023]

Wu M, Xie Z, Li X, Kwoh CK, Zheng J. Identifying protein complexes from heterogeneous biological data. Proteins 2013;81:2023-33. [PMID: 23852772 DOI: 10.1002/prot.24365] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2013] [Revised: 06/03/2013] [Accepted: 06/17/2013] [Indexed: 12/27/2022]

Abstract

With the increasing availability of diverse biological information for proteins, integration of heterogeneous data becomes more useful for many problems in proteomics, such as annotating protein functions, predicting novel protein-protein interactions and so on. In this paper, we present an integrative approach called InteHC (Integrative Hierarchical Clustering) to identify protein complexes from multiple data sources. Although integrating multiple sources could effectively improve the coverage of current insufficient protein interactome (the false negative issue), it could also introduce potential false-positive interactions that could hurt the performance of protein complex prediction. Our proposed InteHC method can effectively address these issues to facilitate accurate protein complex prediction and it is summarized into the following three steps. First, for each individual source/feature, InteHC computes the matrices to store the affinity scores between a protein pair that indicate their propensity to interact or co-complex relationship. Second, InteHC computes a final score matrix, which is the weighted sum of affinity scores from individual sources. In particular, the weights indicating the reliability of individual sources are learned from a supervised model (i.e., a linear ranking SVM). Finally, a hierarchical clustering algorithm is performed on the final score matrix to generate clusters as predicted protein complexes. In our experiments, we compared the results collected by our hierarchical clustering on each individual feature with those predicted by InteHC on the combined matrix. We observed that integration of heterogeneous data significantly benefits the identification of protein complexes. Moreover, a comprehensive comparison demonstrates that InteHC performs much better than 14 state-of-the-art approaches. All the experimental data and results can be downloaded from http://www.ntu.edu.sg/home/zhengjie/data/InteHC.

Collapse

Wu M, Yu Q, Li X, Zheng J, Huang JF, Kwoh CK. Benchmarking human protein complexes to investigate drug-related systems and evaluate predicted protein complexes. PLoS One 2013;8:e53197. [PMID: 23405067 PMCID: PMC3566178 DOI: 10.1371/journal.pone.0053197] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 11/29/2012] [Indexed: 11/18/2022] Open

Abstract

Protein complexes are key entities to perform cellular functions. Human diseases are also revealed to associate with some specific human protein complexes. In fact, human protein complexes are widely used for protein function annotation, inference of human protein interactome, disease gene prediction, and so on. Therefore, it is highly desired to build an up-to-date catalogue of human complexes to support the research in these applications. Protein complexes from different databases are as expected to be highly redundant. In this paper, we designed a set of concise operations to compile these redundant human complexes and built a comprehensive catalogue called CHPC2012 (Catalogue of Human Protein Complexes). CHPC2012 achieves a higher coverage for proteins and protein complexes than those individual databases. It is also verified to be a set of complexes with high quality as its co-complex protein associations have a high overlap with protein-protein interactions (PPI) in various existing PPI databases. We demonstrated two distinct applications of CHPC2012, that is, investigating the relationship between protein complexes and drug-related systems and evaluating the quality of predicted protein complexes. In particular, CHPC2012 provides more insights into drug development. For instance, proteins involved in multiple complexes (the overlapping proteins) are potential drug targets; the drug-complex network is utilized to investigate multi-target drugs and drug-drug interactions; and the disease-specific complex-drug networks will provide new clues for drug repositioning. With this up-to-date reference set of human protein complexes, we believe that the CHPC2012 catalogue is able to enhance the studies for protein interactions, protein functions, human diseases, drugs, and related fields of research. CHPC2012 complexes can be downloaded from http://www1.i2r.a-star.edu.sg/xlli/CHPC2012/CHPC2012.htm.

Collapse

Zhou H, Wong L. Comparative analysis and assessment of M. tuberculosis H37Rv protein-protein interaction datasets. BMC Genomics 2011;12 Suppl 3:S20. [PMID: 22369691 PMCID: PMC3333180 DOI: 10.1186/1471-2164-12-s3-s20] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Abstract

BACKGROUND

M. tuberculosis is a formidable bacterial pathogen. There is thus an increasing demand on understanding the function and relationship of proteins in various strains of M. tuberculosis. Protein-protein interactions (PPIs) data are crucial for this kind of knowledge. However, the quality of the main available M. tuberculosis PPI datasets is unclear. This hampers the effectiveness of research works that rely on these PPI datasets. Here, we analyze the two main available M. tuberculosis H37Rv PPI datasets. The first dataset is the high-throughput B2H PPI dataset from Wang et al's recent paper in Journal of Proteome Research. The second dataset is from STRING database, version 8.3, comprising entirely of H37Rv PPIs predicted using various methods. We find that these two datasets have a surprisingly low level of agreement. We postulate the following causes for this low level of agreement: (i) the H37Rv B2H PPI dataset is of low quality; (ii) the H37Rv STRING PPI dataset is of low quality; and/or (iii) the H37Rv STRING PPIs are predictions of other forms of functional associations rather than direct physical interactions.

RESULTS

To test the quality of these two datasets, we evaluate them based on correlated gene expression profiles, coherent informative GO term annotations, and conservation in other organisms. We observe a significantly greater portion of PPIs in the H37Rv STRING PPI dataset (with score ≥ 770) having correlated gene expression profiles and coherent informative GO term annotations in both interaction partners than that in the H37Rv B2H PPI dataset. Predicted H37Rv interologs derived from non-M. tuberculosis experimental PPIs are much more similar to the H37Rv STRING functional associations dataset (with score ≥ 770) than the H37Rv B2H PPI dataset. H37Rv predicted physical interologs from IntAct also show extremely low similarity with the H37Rv B2H PPI dataset; and this similarity level is much lower than that between the S. aureus MRSA252 predicted physical interologs from IntAct and S. aureus MRSA252 pull-down PPIs. Comparative analysis with several representative two-hybrid PPI datasets in other species further confirms that the H37Rv B2H PPI dataset is of low quality. Next, to test the possibility that the H37Rv STRING PPIs are not purely direct physical interactions, we compare M. tuberculosis H37Rv protein pairs that catalyze adjacent steps in enzymatic reactions to B2H PPIs and predicted PPIs in STRING, which shows it has much lower similarities with the B2H PPIs than with STRING PPIs. This result strongly suggests that the H37Rv STRING PPIs more likely correspond to indirect relationships between protein pairs than to B2H PPIs. For more precise support, we turn to S. cerevisiae for its comprehensively studied interactome. We compare S. cerevisiae predicted PPIs in STRING to three independent protein relationship datasets which respectively comprise PPIs reported in Y2H assays, protein pairs reported to be in the same protein complexes, and protein pairs that catalyze successive reaction steps in enzymatic reactions. Our analysis reveals that S. cerevisiae predicted STRING PPIs have much higher similarity to the latter two types of protein pairs than to two-hybrid PPIs. As H37Rv STRING PPIs are predicted using similar methods as S. cerevisiae predicted STRING PPIs, this suggests that these H37Rv STRING PPIs are more likely to correspond to the latter two types of protein pairs rather than to two-hybrid PPIs as well.

CONCLUSIONS

The H37Rv B2H PPI dataset has low quality. It should not be used as the gold standard to assess the quality of other (possibly predicted) H37Rv PPI datasets. The H37Rv STRING PPI dataset also has low quality; nevertheless, a subset consisting of STRING PPIs with score ≥770 has satisfactory quality. However, these STRING "PPIs" should be interpreted as functional associations, which include a substantial portion of indirect protein interactions, rather than direct physical interactions. These two factors cause the strikingly low similarity between these two main H37Rv PPI datasets. The results and conclusions from this comparative analysis provide valuable guidance in using these M. tuberculosis H37Rv PPI datasets in subsequent studies for a wide range of purposes.

Collapse

Schönbach C, Nakai K, Tan TW, Ranganathan S. InCoB2010 - 9th International Conference on Bioinformatics at Tokyo, Japan, September 26-28, 2010. BMC Bioinformatics 2010;11 Suppl 7:S1. [PMID: 21106116 PMCID: PMC2957677 DOI: 10.1186/1471-2105-11-s7-s1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open