1
|
Shtetinska MM, González-Sánchez JC, Beyer T, Boldt K, Ueffing M, Russell R. WeSA: a web server for improving analysis of affinity proteomics data. Nucleic Acids Res 2024; 52:W333-W340. [PMID: 38795065 PMCID: PMC11223876 DOI: 10.1093/nar/gkae423] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2024] [Revised: 04/23/2024] [Accepted: 05/14/2024] [Indexed: 05/27/2024] Open
Abstract
Protein-protein interaction experiments still yield many false positive interactions. The socioaffinity metric can distinguish true protein-protein interactions from noise based on available data. Here, we present WeSA (Weighted SocioAffinity), which considers large datasets of interaction proteomics data (IntAct, BioGRID, the BioPlex) to score human protein interactions and, in a statistically robust way, flag those (even from a single experiment) that are likely to be false positives. ROC analysis (using CORUM-PDB positives and Negatome negatives) shows that WeSA improves over other measures of interaction confidence. WeSA shows consistently good results over all datasets (up to: AUC = 0.93 and at best threshold: TPR = 0.84, FPR = 0.11, Precision = 0.98). WeSA is freely available without login (wesa.russelllab.org). Users can submit their own data or look for organized information on human protein interactions using the web server. Users can either retrieve available information for a list of proteins of interest or calculate scores for new experiments. The server outputs either pre-computed or updated WeSA scores for the input enriched with information from databases. The summary is presented as a table and a network-based visualization allowing the user to remove those nodes/edges that the method considers spurious.
Collapse
Affiliation(s)
- Magdalena M Shtetinska
- BioQuant, Heidelberg University, 69120 Heidelberg, Germany
- Biochemistry Center (BZH), Heidelberg University, 69120 Heidelberg, Germany
| | - Juan-Carlos González-Sánchez
- BioQuant, Heidelberg University, 69120 Heidelberg, Germany
- Biochemistry Center (BZH), Heidelberg University, 69120 Heidelberg, Germany
| | - Tina Beyer
- Institute for Ophthalmic Research, Center for Ophthalmology, University of Tübingen, 72076 Tübingen, Germany
| | - Karsten Boldt
- Institute for Ophthalmic Research, Center for Ophthalmology, University of Tübingen, 72076 Tübingen, Germany
| | - Marius Ueffing
- Institute for Ophthalmic Research, Center for Ophthalmology, University of Tübingen, 72076 Tübingen, Germany
| | - Robert B Russell
- BioQuant, Heidelberg University, 69120 Heidelberg, Germany
- Biochemistry Center (BZH), Heidelberg University, 69120 Heidelberg, Germany
| |
Collapse
|
2
|
Tian B, Duan Q, Zhao C, Teng B, He Z. Reinforce: An Ensemble Approach for Inferring PPI Network from AP-MS Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:365-376. [PMID: 28534782 DOI: 10.1109/tcbb.2017.2705060] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
Affinity Purification-Mass Spectrometry (AP-MS) is one of the most important technologies for constructing protein-protein interaction (PPI) networks. In this paper, we propose an ensemble method, Reinforce, for inferring PPI network from AP-MS data set. The new algorithm named Reinforce is based on rank aggregation and false discovery rate control. Under the null hypothesis that the interaction scores from different scoring methods are randomly generated, Reinforce follows three steps to integrate multiple ranking results from different algorithms or different data sets. The experimental results show that Reinforce can get more stable and accurate inference results than existing algorithms. The source codes of Reinforce and data sets used in the experiments are available at: https://sourceforge.net/projects/reinforce/.
Collapse
|
3
|
Tian B, Zhao C, Gu F, He Z. A two-step framework for inferring direct protein-protein interaction network from AP-MS data. BMC SYSTEMS BIOLOGY 2017; 11:82. [PMID: 28950876 PMCID: PMC5615237 DOI: 10.1186/s12918-017-0452-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/24/2023]
Abstract
Background Affinity purification-mass spectrometry (AP-MS) has been widely used for generating bait-prey data sets so as to identify underlying protein-protein interactions and protein complexes. However, the AP-MS data sets in terms of bait-prey pairs are highly noisy, where candidate pairs contain many false positives. Recently, numerous computational methods have been developed to identify genuine interactions from AP-MS data sets. However, most of these methods aim at removing false positives that contain contaminants, ignoring the distinction between direct interactions and indirect interactions. Results In this paper, we present an initialization-and-refinement framework for inferring direct PPI networks from AP-MS data, in which an initial network is first generated with existing scoring methods and then a refined network is constructed by the application of indirect association removal methods. Experimental results on several real AP-MS data sets show that our method is capable of identifying more direct interactions than traditional scoring methods. Conclusions The proposed framework is sufficiently general to incorporate any feasible methods in each step so as to have potential for handling different types of AP-MS data in the future applications. Electronic supplementary material The online version of this article (doi:10.1186/s12918-017-0452-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Bo Tian
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
| | - Can Zhao
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
| | - Feiyang Gu
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China
| | - Zengyou He
- School of Software, Dalian University of Technology, Tuqiang Road, Dalian, China. .,Key Laboratory for Ubiquitous Network and Service Software of Liaoning, Tuqiang Road 321, Dalian, 116600, China.
| |
Collapse
|
4
|
Zhang XF, Ou-Yang L, Hu X, Dai DQ. Identifying binary protein-protein interactions from affinity purification mass spectrometry data. BMC Genomics 2015; 16:745. [PMID: 26438428 PMCID: PMC4595009 DOI: 10.1186/s12864-015-1944-z] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/06/2014] [Accepted: 09/22/2015] [Indexed: 02/04/2023] Open
Abstract
Background The identification of protein-protein interactions contributes greatly to the understanding of functional organization within cells. With the development of affinity purification-mass spectrometry (AP-MS) techniques, several computational scoring methods have been proposed to detect protein interactions from AP-MS data. However, most of the current methods focus on the detection of co-complex interactions and do not discriminate between direct physical interactions and indirect interactions. Consequently, less is known about the precise physical wiring diagram within cells. Results In this paper, we develop a Binary Interaction Network Model (BINM) to computationally identify direct physical interactions from co-complex interactions which can be inferred from purification data using previous scoring methods. This model provides a mathematical framework for capturing topological relationships between direct physical interactions and observed co-complex interactions. It reassigns a confidence score to each observed interaction to indicate its propensity to be a direct physical interaction. Then observed interactions with high confidence scores are predicted as direct physical interactions. We run our model on two yeast co-complex interaction networks which are constructed by two different scoring methods on a same combined AP-MS data. The direct physical interactions identified by various methods are comprehensively benchmarked against different reference sets that provide both direct and indirect evidence for physical contacts. Experiment results show that our model has a competitive performance over the state-of-the-art methods. Conclusions According to the results obtained in this study, BINM is a powerful scoring method that can solely use network topology to predict direct physical interactions from AP-MS data. This study provides us an alternative approach to explore the information inherent in AP-MS data. The software can be downloaded from https://github.com/Zhangxf-ccnu/BINM. Electronic supplementary material The online version of this article (doi:10.1186/s12864-015-1944-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xiao-Fei Zhang
- School of Mathematics and Statistics, Central China Normal University, Luoyu Road, Wuhan, 430079, China.
| | - Le Ou-Yang
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xingang West Road, Guangzhou, 510275, China.
| | - Xiaohua Hu
- School of Computer, Central China Normal University, 774 Luoyu Road, Wuhan, 430079, China. .,College of Information Science and Technology, Drexel University, Chestnut Street, Philadelphia, 19104, USA.
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xingang West Road, Guangzhou, 510275, China.
| |
Collapse
|
5
|
Huang Q, You Z, Zhang X, Zhou Y. Prediction of protein-protein interactions with clustered amino acids and weighted sparse representation. Int J Mol Sci 2015; 16:10855-69. [PMID: 25984606 PMCID: PMC4463679 DOI: 10.3390/ijms160510855] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2015] [Revised: 05/06/2015] [Accepted: 05/07/2015] [Indexed: 01/22/2023] Open
Abstract
With the completion of the Human Genome Project, bioscience has entered into the era of the genome and proteome. Therefore, protein–protein interactions (PPIs) research is becoming more and more important. Life activities and the protein–protein interactions are inseparable, such as DNA synthesis, gene transcription activation, protein translation, etc. Though many methods based on biological experiments and machine learning have been proposed, they all spent a long time to learn and obtained an imprecise accuracy. How to efficiently and accurately predict PPIs is still a big challenge. To take up such a challenge, we developed a new predictor by incorporating the reduced amino acid alphabet (RAAA) information into the general form of pseudo-amino acid composition (PseAAC) and with the weighted sparse representation-based classification (WSRC). The remarkable advantages of introducing the reduced amino acid alphabet is being able to avoid the notorious dimensionality disaster or overfitting problem in statistical prediction. Additionally, experiments have proven that our method achieved good performance in both a low- and high-dimensional feature space. Among all of the experiments performed on the PPIs data of Saccharomyces cerevisiae, the best one achieved 90.91% accuracy, 94.17% sensitivity, 87.22% precision and a 83.43% Matthews correlation coefficient (MCC) value. In order to evaluate the prediction ability of our method, extensive experiments are performed to compare with the state-of-the-art technique, support vector machine (SVM). The achieved results show that the proposed approach is very promising for predicting PPIs, and it can be a helpful supplement for PPIs prediction.
Collapse
Affiliation(s)
- Qiaoying Huang
- Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
| | - Zhuhong You
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| | - Xiaofeng Zhang
- Shenzhen Graduate School, Harbin Institute of Technology, HIT Campus of University Town of Shenzhen, Shenzhen 518055, China.
| | - Yong Zhou
- School of Computer Science and Technology, China University of Mining and Technology, Xuzhou 221116, China.
| |
Collapse
|
6
|
Extracting high confidence protein interactions from affinity purification data: at the crossroads. J Proteomics 2015; 118:63-80. [PMID: 25782749 DOI: 10.1016/j.jprot.2015.03.009] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2014] [Revised: 02/27/2015] [Accepted: 03/09/2015] [Indexed: 02/06/2023]
Abstract
UNLABELLED Deriving protein-protein interactions from data generated by affinity-purification and mass spectrometry (AP-MS) techniques requires application of scoring methods to measure the reliability of detected putative interactions. Choosing the appropriate scoring method has become a major challenge. Here we apply six popular scoring methods to the same AP-MS dataset and compare their performance. The comparison was carried out for six distinct datasets from human, fly and yeast, which focus on different biological processes and differ in their coverage of the proteome. Results show that the performance of a given scoring method may vary substantially depending on the dataset. Disturbingly, we find that the high confidence (HC) PPI networks built by applying the six scoring methods to the same raw AP-MS dataset display very poor overlap, with only 1.7-4.1% of the HC interactions present in all the networks built, respectively, from the proteome-wide human, fly or yeast datasets. Various properties of the shared versus unique interactions in each network, including biases in protein abundance, suggest that current scoring methods are able to eliminate only the most obvious contaminants, but still fail to reliably single out specific interactions from the large body of spurious associations detected in the AP-MS experiments. BIOLOGICAL SIGNIFICANCE The fast progress in AP-MS techniques has prompted the development of a multitude of scoring methods, which are relied upon to remove contaminants and non-specific binders. Choosing the appropriate scoring scheme for a given AP-MS dataset has become a major challenge. The comparative analysis of 6 of the most popular scoring methods, presented here, reveals that overall these methods do not perform as expected. Evidence is provided that this is due to 3 closely related issues: the high 'noise' levels of the raw AP-MS data, the limited capacity of current scoring methods to deal with such high noise levels, and the biases introduced using Gold Standard datasets to benchmark the scoring functions and threshold the networks. For the field to move forward, all three issues will have to be addressed. This article is part of a Special Issue entitled: Protein dynamics in health and disease. Guest Editors: Pierre Thibault and Anne-Claude Gingras.
Collapse
|
7
|
Teng B, Zhao C, Liu X, He Z. Network inference from AP-MS data: computational challenges and solutions. Brief Bioinform 2014; 16:658-74. [DOI: 10.1093/bib/bbu038] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/10/2014] [Accepted: 09/30/2014] [Indexed: 02/04/2023] Open
|
8
|
Ritz A, Tegge AN, Kim H, Poirel CL, Murali TM. Signaling hypergraphs. Trends Biotechnol 2014; 32:356-62. [PMID: 24857424 PMCID: PMC4299695 DOI: 10.1016/j.tibtech.2014.04.007] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2013] [Revised: 04/01/2014] [Accepted: 04/04/2014] [Indexed: 01/10/2023]
Abstract
Signaling pathways function as the information-passing mechanisms of cells. A number of databases with extensive manual curation represent the current knowledge base for signaling pathways. These databases motivate the development of computational approaches for prediction and analysis. Such methods require an accurate and computable representation of signaling pathways. Pathways are often described as sets of proteins or as pairwise interactions between proteins. However, many signaling mechanisms cannot be described using these representations. In this opinion, we highlight a representation of signaling pathways that is underutilized: the hypergraph. We demonstrate the usefulness of hypergraphs in this context and discuss challenges and opportunities for the scientific community.
Collapse
Affiliation(s)
- Anna Ritz
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Allison N Tegge
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | - Hyunju Kim
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA
| | | | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA, USA; ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA, USA.
| |
Collapse
|
9
|
Rahman A, Poirel CL, Badger DJ, Estep C, Murali T. Reverse engineering molecular hypergraphs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2013; 10:1113-1124. [PMID: 24384702 PMCID: PMC4051496 DOI: 10.1109/tcbb.2013.71] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
Analysis of molecular interaction networks is pervasive in systems biology. This research relies almost entirely on graphs for modeling interactions. However, edges in graphs cannot represent multiway interactions among molecules, which occur very often within cells. Hypergraphs may be better representations for networks having such interactions, since hyperedges can naturally represent relationships among multiple molecules. Here, we propose using hypergraphs to capture the uncertainty inherent in reverse engineering gene-gene networks. Some subsets of nodes may induce highly varying subgraphs across an ensemble of networks inferred by a reverse engineering algorithm. We provide a novel formulation of hyperedges to capture this uncertainty in network topology. We propose a clustering-based approach to discover hyperedges. We show that our approach can recover hyperedges planted in synthetic data sets with high precision and recall, even for moderate amount of noise. We apply our techniques to a data set of pathways inferred from genetic interaction data in S. cerevisiae related to the unfolded protein response. Our approach discovers several hyperedges that capture the uncertain connectivity of genes in relevant protein complexes, suggesting that further experiments may be required to precisely discern their interaction patterns. We also show that these complexes are not discovered by an algorithm that computes frequent and dense subgraphs.
Collapse
Affiliation(s)
- Ahsanur Rahman
- Department of Computer Science, Virginia Tech, Blacksburg, VA
| | | | - David J. Badger
- Department of Computer Science, Virginia Tech, Blacksburg, VA
| | - Craig Estep
- Department of Computer Science, Virginia Tech, Blacksburg, VA
| | - T.M. Murali
- Department of Computer Science and the ICTAS Center for Systems Biology of Engineered Tissues, Virginia Tech, Blacksburg, VA
| |
Collapse
|
10
|
Rajagopala SV, Sikorski P, Caufield JH, Tovchigrechko A, Uetz P. Studying protein complexes by the yeast two-hybrid system. Methods 2012; 58:392-9. [PMID: 22841565 DOI: 10.1016/j.ymeth.2012.07.015] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2012] [Revised: 07/10/2012] [Accepted: 07/12/2012] [Indexed: 01/13/2023] Open
Abstract
Protein complexes are typically analyzed by affinity purification and subsequent mass spectrometric analysis. However, in most cases the structure and topology of the complexes remains elusive from such studies. Here we investigate how the yeast two-hybrid system can be used to analyze direct interactions among proteins in a complex. First we tested all pairwise interactions among the seven proteins of Escherichia coli DNA polymerase III as well as an uncharacterized complex that includes MntR and PerR. Four and seven interactions were identified in these two complexes, respectively. In addition, we review Y2H data for three other complexes of known structure which serve as "gold-standards", namely Varicella Zoster Virus (VZV) ribonucleotide reductase (RNR), the yeast proteasome, and bacteriophage lambda. Finally, we review an Y2H analysis of the human spliceosome which may serve as an example for a dynamic mega-complex.
Collapse
|
11
|
Stukalov A, Superti-Furga G, Colinge J. Deconvolution of Targeted Protein–Protein Interaction Maps. J Proteome Res 2012; 11:4102-9. [DOI: 10.1021/pr300137n] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023]
Affiliation(s)
- Alexey Stukalov
- CeMM − Center for Molecular Medicine of the Austrian Academy of Sciences, AKH-BT 25.3, Lazarettgasse
14, A-1090 Vienna, Austria
| | - Giulio Superti-Furga
- CeMM − Center for Molecular Medicine of the Austrian Academy of Sciences, AKH-BT 25.3, Lazarettgasse
14, A-1090 Vienna, Austria
| | - Jacques Colinge
- CeMM − Center for Molecular Medicine of the Austrian Academy of Sciences, AKH-BT 25.3, Lazarettgasse
14, A-1090 Vienna, Austria
| |
Collapse
|
12
|
Yu X, Wallqvist A, Reifman J. Inferring high-confidence human protein-protein interactions. BMC Bioinformatics 2012; 13:79. [PMID: 22558947 PMCID: PMC3416704 DOI: 10.1186/1471-2105-13-79] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2011] [Accepted: 05/04/2012] [Indexed: 01/09/2023] Open
Abstract
Background As numerous experimental factors drive the acquisition, identification, and interpretation of protein-protein interactions (PPIs), aggregated assemblies of human PPI data invariably contain experiment-dependent noise. Ascertaining the reliability of PPIs collected from these diverse studies and scoring them to infer high-confidence networks is a non-trivial task. Moreover, a large number of PPIs share the same number of reported occurrences, making it impossible to distinguish the reliability of these PPIs and rank-order them. For example, for the data analyzed here, we found that the majority (>83%) of currently available human PPIs have been reported only once. Results In this work, we proposed an unsupervised statistical approach to score a set of diverse, experimentally identified PPIs from nine primary databases to create subsets of high-confidence human PPI networks. We evaluated this ranking method by comparing it with other methods and assessing their ability to retrieve protein associations from a number of diverse and independent reference sets. These reference sets contain known biological data that are either directly or indirectly linked to interactions between proteins. We quantified the average effect of using ranked protein interaction data to retrieve this information and showed that, when compared to randomly ranked interaction data sets, the proposed method created a larger enrichment (~134%) than either ranking based on the hypergeometric test (~109%) or occurrence ranking (~46%). Conclusions From our evaluations, it was clear that ranked interactions were always of value because higher-ranked PPIs had a higher likelihood of retrieving high-confidence experimental data. Reducing the noise inherent in aggregated experimental PPIs via our ranking scheme further increased the accuracy and enrichment of PPIs derived from a number of biologically relevant data sets. These results suggest that using our high-confidence protein interactions at different levels of confidence will help clarify the topological and biological properties associated with human protein networks.
Collapse
Affiliation(s)
- Xueping Yu
- Biotechnology High-Performance Computing Software Applications Institute, Telemedicine and Advanced Technology Research Center, U.S. Army Medical Research and Materiel Command, Ft. Detrick, MD 21702, USA
| | | | | |
Collapse
|