1
|
Bashiri H, Rahmani H, Bashiri V, Módos D, Bender A. EMDIP: An Entropy Measure to Discover Important Proteins in PPI networks. Comput Biol Med 2020; 120:103740. [PMID: 32421645 DOI: 10.1016/j.compbiomed.2020.103740] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2019] [Revised: 03/30/2020] [Accepted: 03/30/2020] [Indexed: 12/24/2022]
Abstract
Discovering important proteins in Protein-Protein Interaction (PPI) networks has attracted a lot of attention in recent years. Most of the previous work applies different network centrality measures such as Closeness, Betweenness, PageRank and many others to discover the most influential proteins in PPI networks. Although entropy is a well-known graph-based method in computer science, according to our knowledge, it is not used in the biology domain for this purpose. In this paper, first, we annotate the human PPI network with available annotation data. Second, we introduce a new concept called annotation-context that describes each protein according to annotation data of its neighbors. Third, we apply an entropy measure to discover proteins with varied annotation-context. Empirical results indicate that our proposed method succeeded in (1) differentiating essential and non-essential proteins in PPI networks with annotation data; (2) outperforming centrality measures in the task of discovering essential nodes; (3) predicting new annotated proteins based on existing annotation data.
Collapse
Affiliation(s)
- Hamid Bashiri
- School of Computer engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| | - Hossein Rahmani
- School of Computer engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran.
| | - Vahid Bashiri
- School of Computer engineering, Iran University of Science and Technology, Tehran, 16846-13114, Iran
| | - Dezső Módos
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
| | - Andreas Bender
- Centre for Molecular Informatics, Department of Chemistry, University of Cambridge, Lensfield Road, Cambridge, CB2 1EW, United Kingdom
| |
Collapse
|
2
|
Yue Z, Nguyen T, Zhang E, Zhang J, Chen JY. WIPER: Weighted in-Path Edge Ranking for biomolecular association networks. QUANTITATIVE BIOLOGY 2019; 7:313-326. [PMID: 38525413 PMCID: PMC10959292 DOI: 10.1007/s40484-019-0180-y] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2019] [Revised: 08/02/2019] [Accepted: 08/08/2019] [Indexed: 10/25/2022]
Abstract
Background In network biology researchers generate biomolecular networks with candidate genes or proteins experimentally-derived from high-throughput data and known biomolecular associations. Current bioinformatics research focuses on characterizing candidate genes/proteins, or nodes, with network characteristics, e.g., betweenness centrality. However, there have been few research reports to characterize and prioritize biomolecular associations ("edges"), which can represent gene regulatory events essential to biological processes. Method We developed Weighted In-Path Edge Ranking (WIPER), a new computational algorithm which can help evaluate all biomolecular interactions/associations ("edges") in a network model and generate a rank order of every edge based on their in-path traversal scores and statistical significance test result. To validate whether WIPER worked as we designed, we tested the algorithm on synthetic network models. Results Our results showed WIPER can reliably discover both critical "well traversed in-path edges", which are statistically more traversed than normal edges, and "peripheral in-path edges", which are less traversed than normal edges. Compared with other simple measures such as betweenness centrality, WIPER provides better biological interpretations. In the case study of analyzing postanal pig hearts gene expression, WIPER highlighted new signaling pathways suggestive of cardiomyocyte regeneration and proliferation. In the case study of Alzheimer's disease genetic disorder association, WIPER reports SRC:APP, AR:APP, APP:FYN, and APP:NES edges (gene-gene associations) both statistically and biologically important from PubMed co-citation. Conclusion We believe that WIPER will become an essential software tool to help biologists discover and validate essential signaling/regulatory events from high-throughput biology data in the context of biological networks. Availability The free WIPER API is described at discovery.informatics.uab.edu/wiper/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
| | - Thanh Nguyen
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
| | - Eric Zhang
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
| | - Jianyi Zhang
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
| | - Jake Y. Chen
- Informatics Institute, School of Medicine, University of Alabama, Birmingham, AL 35233, USA
- Department of Biomedical Engineering, University of Alabama, Birmingham, AL 35233, USA
- Department of Computer Science, University of Alabama, Birmingham, AL 35233, USA
| |
Collapse
|
3
|
Abstract
Background:
Essential proteins play important roles in the survival or reproduction of
an organism and support the stability of the system. Essential proteins are the minimum set of
proteins absolutely required to maintain a living cell. The identification of essential proteins is a
very important topic not only for a better comprehension of the minimal requirements for cellular
life, but also for a more efficient discovery of the human disease genes and drug targets.
Traditionally, as the experimental identification of essential proteins is complex, it usually requires
great time and expense. With the cumulation of high-throughput experimental data, many
computational methods that make useful complements to experimental methods have been
proposed to identify essential proteins. In addition, the ability to rapidly and precisely identify
essential proteins is of great significance for discovering disease genes and drug design, and has
great potential for applications in basic and synthetic biology research.
Objective:
The aim of this paper is to provide a review on the identification of essential proteins
and genes focusing on the current developments of different types of computational methods, point
out some progress and limitations of existing methods, and the challenges and directions for
further research are discussed.
Collapse
Affiliation(s)
- Ming Fang
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Ling Guo
- College of Life Sciences, Shaanxi Normal University, Xi'an 710119, China
| |
Collapse
|
4
|
Zhang W, Xu J, Li Y, Zou X. A new two-stage method for revealing missing parts of edges in protein-protein interaction networks. PLoS One 2017; 12:e0177029. [PMID: 28493910 PMCID: PMC5426645 DOI: 10.1371/journal.pone.0177029] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/04/2016] [Accepted: 04/20/2017] [Indexed: 12/24/2022] Open
Abstract
With the increasing availability of high-throughput data, various computational methods have recently been developed for understanding the cell through protein-protein interaction (PPI) networks at a systems level. However, due to the incompleteness of the original PPI networks those efforts have been significantly hindered. In this paper, we propose a two stage method to predict underlying links between two originally unlinked protein pairs. First, we measure gene expression and gene functional similarly between unlinked protein pairs on Saccharomyces cerevisiae benchmark network and obtain new constructed networks. Then, we select the significant part of the new predicted links by analyzing the difference between essential proteins that have been identified based on the new constructed networks and the original network. Furthermore, we validate the performance of the new method by using the reliable and comprehensive PPI dataset obtained from the STRING database and compare the new proposed method with four other random walk-based methods. Comparing the results indicates that the new proposed strategy performs well in predicting underlying links. This study provides a general paradigm for predicting new interactions between protein pairs and offers new insights into identifying essential proteins.
Collapse
Affiliation(s)
- Wei Zhang
- School of Science, East China Jiaotong University, Nanchang 330013, China
- * E-mail: (WZ); (XFZ)
| | - Jia Xu
- School of Mechatronic Engineering, East China Jiaotong University, Nanchang 330013, China
| | - Yuanyuan Li
- School of Mathematics and Statistics, Wuhan Institute of Technology in Wuhan, Wuhan, 430072, China
| | - Xiufen Zou
- School of Mathematics and Statistics, Wuhan University, Wuhan 430072, China
- * E-mail: (WZ); (XFZ)
| |
Collapse
|
5
|
Zhang W, Xu J, Li X, Zou X. A New Method for Identifying Essential Proteins by Measuring Co-Expression and Functional Similarity. IEEE Trans Nanobioscience 2016; 15:939-945. [DOI: 10.1109/tnb.2016.2625460] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
|
6
|
Qin C, Sun Y, Dong Y. A New Method for Identifying Essential Proteins Based on Network Topology Properties and Protein Complexes. PLoS One 2016; 11:e0161042. [PMID: 27529423 PMCID: PMC4987049 DOI: 10.1371/journal.pone.0161042] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 07/28/2016] [Indexed: 11/18/2022] Open
Abstract
Essential proteins are indispensable to the viability and reproduction of an organism. The identification of essential proteins is necessary not only for understanding the molecular mechanisms of cellular life but also for disease diagnosis, medical treatments and drug design. Many computational methods have been proposed for discovering essential proteins, but the precision of the prediction of essential proteins remains to be improved. In this paper, we propose a new method, LBCC, which is based on the combination of local density, betweenness centrality (BC) and in-degree centrality of complex (IDC). First, we introduce the common centrality measures; second, we propose the densities Den1(v) and Den2(v) of a node v to describe its local properties in the network; and finally, the combined strategy of Den1, Den2, BC and IDC is developed to improve the prediction precision. The experimental results demonstrate that LBCC outperforms traditional topological measures for predicting essential proteins, including degree centrality (DC), BC, subgraph centrality (SC), eigenvector centrality (EC), network centrality (NC), and the local average connectivity-based method (LAC). LBCC also improves the prediction precision by approximately 10 percent on the YMIPS and YMBD datasets compared to the most recently developed method, LIDC.
Collapse
Affiliation(s)
- Chao Qin
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| | - Yongqi Sun
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
- * E-mail:
| | - Yadong Dong
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
7
|
Jiang Y, Wang Y, Pang W, Chen L, Sun H, Liang Y, Blanzieri E. Essential protein identification based on essential protein-protein interaction prediction by Integrated Edge Weights. Methods 2015; 83:51-62. [PMID: 25892709 DOI: 10.1016/j.ymeth.2015.04.013] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Revised: 04/09/2015] [Accepted: 04/10/2015] [Indexed: 12/19/2022] Open
Abstract
Essential proteins play a crucial role in cellular survival and development process. Experimentally, essential proteins are identified by gene knockouts or RNA interference, which are expensive and often fatal to the target organisms. Regarding this, an alternative yet important approach to essential protein identification is through computational prediction. Existing computational methods predict essential proteins based on their relative densities in a protein-protein interaction (PPI) network. Degree, betweenness, and other appropriate criteria are often used to measure the relative density. However, no matter what criterion is used, a protein is actually ordered by the attributes of this protein per se. In this research, we presented a novel computational method, Integrated Edge Weights (IEW), to first rank protein-protein interactions by integrating their edge weights, and then identified sub PPI networks consisting of those highly-ranked edges, and finally regarded the nodes in these sub networks as essential proteins. We evaluated IEW on three model organisms: Saccharomyces cerevisiae (S. cerevisiae), Escherichia coli (E. coli), and Caenorhabditis elegans (C. elegans). The experimental results showed that IEW achieved better performance than the state-of-the-art methods in terms of precision-recall and Jackknife measures. We had also demonstrated that IEW is a robust and effective method, which can retrieve biologically significant modules by its highly-ranked protein-protein interactions for S. cerevisiae, E. coli, and C. elegans. We believe that, with sufficient data provided, IEW can be used to any other organisms' essential protein identification. A website about IEW can be accessed from http://digbio.missouri.edu/IEW/index.html.
Collapse
Affiliation(s)
- Yuexu Jiang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China; Department of Computer Science, University of Missouri, Columbia, MO, United States
| | - Yan Wang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China; Department of Information Engineering and Computer Science, University of Trento, Povo, Italy.
| | - Wei Pang
- School of Natural and Computing Sciences, University of Aberdeen, Aberdeen, UK
| | - Liang Chen
- Key Laboratory of Symbolic Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Huiyan Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China
| | - Yanchun Liang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, College of Computer Science and Technology, Jilin University, Changchun 130012, China; Department of Computer Science and Technology, Zhuhai College of Jilin University, Zhuhai 519041, China.
| | - Enrico Blanzieri
- Department of Information Engineering and Computer Science, University of Trento, Povo, Italy.
| |
Collapse
|