1
|
He X, Kuang L, Chen Z, Tan Y, Wang L. Method for Identifying Essential Proteins by Key Features of Proteins in a Novel Protein-Domain Network. Front Genet 2021; 12:708162. [PMID: 34267785 PMCID: PMC8276041 DOI: 10.3389/fgene.2021.708162] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2021] [Accepted: 05/31/2021] [Indexed: 11/21/2022] Open
Abstract
In recent years, due to low accuracy and high costs of traditional biological experiments, more and more computational models have been proposed successively to infer potential essential proteins. In this paper, a novel prediction method called KFPM is proposed, in which, a novel protein-domain heterogeneous network is established first by combining known protein-protein interactions with known associations between proteins and domains. Next, based on key topological characteristics extracted from the newly constructed protein-domain network and functional characteristics extracted from multiple biological information of proteins, a new computational method is designed to effectively integrate multiple biological features to infer potential essential proteins based on an improved PageRank algorithm. Finally, in order to evaluate the performance of KFPM, we compared it with 13 state-of-the-art prediction methods, experimental results show that, among the top 1, 5, and 10% of candidate proteins predicted by KFPM, the prediction accuracy can achieve 96.08, 83.14, and 70.59%, respectively, which significantly outperform all these 13 competitive methods. It means that KFPM may be a meaningful tool for prediction of potential essential proteins in the future.
Collapse
Affiliation(s)
- Xin He
- College of Computer, Xiangtan University, Xiangtan, China
| | - Linai Kuang
- College of Computer, Xiangtan University, Xiangtan, China
| | - Zhiping Chen
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Yihong Tan
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer, Xiangtan University, Xiangtan, China
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| |
Collapse
|
2
|
Li M, Meng X, Zheng R, Wu FX, Li Y, Pan Y, Wang J. Identification of Protein Complexes by Using a Spatial and Temporal Active Protein Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:817-827. [PMID: 28885159 DOI: 10.1109/tcbb.2017.2749571] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The rapid development of proteomics and high-throughput technologies has produced a large amount of Protein-Protein Interaction (PPI) data, which makes it possible for considering dynamic properties of protein interaction networks (PINs) instead of static properties. Identification of protein complexes from dynamic PINs becomes a vital scientific problem for understanding cellular life in the post genome era. Up to now, plenty of models or methods have been proposed for the construction of dynamic PINs to identify protein complexes. However, most of the constructed dynamic PINs just focus on the temporal dynamic information and thus overlook the spatial dynamic information of the complex biological systems. To address the limitation of the existing dynamic PIN analysis approaches, in this paper, we propose a new model-based scheme for the construction of the Spatial and Temporal Active Protein Interaction Network (ST-APIN) by integrating time-course gene expression data and subcellular location information. To evaluate the efficiency of ST-APIN, the commonly used classical clustering algorithm MCL is adopted to identify protein complexes from ST-APIN and the other three dynamic PINs, NF-APIN, DPIN, and TC-PIN. The experimental results show that, the performance of MCL on ST-APIN outperforms those on the other three dynamic PINs in terms of matching with known complexes, sensitivity, specificity, and f-measure. Furthermore, we evaluate the identified protein complexes by Gene Ontology (GO) function enrichment analysis. The validation shows that the identified protein complexes from ST-APIN are more biologically significant. This study provides a general paradigm for constructing the ST-APINs, which is essential for further understanding of molecular systems and the biomedical mechanism of complex diseases.
Collapse
|
3
|
Identifying Protein Complexes from Dynamic Temporal Interval Protein-Protein Interaction Networks. BIOMED RESEARCH INTERNATIONAL 2019; 2019:3726721. [PMID: 31531351 PMCID: PMC6720829 DOI: 10.1155/2019/3726721] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/24/2019] [Revised: 05/22/2019] [Accepted: 07/04/2019] [Indexed: 11/26/2022]
Abstract
Identification of protein complex is very important for revealing the underlying mechanism of biological processes. Many computational methods have been developed to identify protein complexes from static protein-protein interaction (PPI) networks. Recently, researchers are considering the dynamics of protein-protein interactions. Dynamic PPI networks are closer to reality in the cell system. It is expected that more protein complexes can be accurately identified from dynamic PPI networks. In this paper, we use the undulating degree above the base level of gene expression instead of the gene expression level to construct dynamic temporal PPI networks. Further we convert dynamic temporal PPI networks into dynamic Temporal Interval Protein Interaction Networks (TI-PINs) and propose a novel method to accurately identify more protein complexes from the constructed TI-PINs. Owing to preserving continuous interactions within temporal interval, the constructed TI-PINs contain more dynamical information for accurately identifying more protein complexes. Our proposed identification method uses multisource biological data to judge whether the joint colocalization condition, the joint coexpression condition, and the expanding cluster condition are satisfied; this is to ensure that the identified protein complexes have the features of colocalization, coexpression, and functional homogeneity. The experimental results on yeast data sets demonstrated that using the constructed TI-PINs can obtain better identification of protein complexes than five existing dynamic PPI networks, and our proposed identification method can find more protein complexes accurately than four other methods.
Collapse
|
4
|
Li M, Ni P, Chen X, Wang J, Wu FX, Pan Y. Construction of Refined Protein Interaction Network for Predicting Essential Proteins. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1386-1397. [PMID: 28186903 DOI: 10.1109/tcbb.2017.2665482] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Identification of essential proteins based on protein interaction network (PIN) is a very important and hot topic in the post genome era. Up to now, a number of network-based essential protein discovery methods have been proposed. Generally, a static protein interaction network was constructed by using the protein-protein interactions obtained from different experiments or databases. Unfortunately, most of the network-based essential protein discovery methods are sensitive to the reliability of the constructed PIN. In this paper, we propose a new method for constructing refined PIN by using gene expression profiles and subcellular location information. The basic idea behind refining the PIN is that two proteins should have higher possibility to physically interact with each other if they appear together at the same subcellular location and are active together at least at a time point in the cell cycle. The original static PIN is denoted by S-PIN while the final PIN refined by our method is denoted by TS-PIN. To evaluate whether the constructed TS-PIN is more suitable to be used in the identification of essential proteins, 10 network-based essential protein discovery methods (DC, EC, SC, BC, CC, IC, LAC, NC, BN, and DMNC) are applied on it to identify essential proteins. A comparison of TS-PIN and two other networks: S-PIN and NF-APIN (a noise-filtered active PIN constructed by using gene expression data and S-PIN) is implemented on the prediction of essential proteins by using these ten network-based methods. The comparison results show that all of the 10 network-based methods achieve better results when being applied on TS-PIN than that being applied on S-PIN and NF-APIN.
Collapse
|
5
|
Zhang F, Peng W, Yang Y, Dai W, Song J. A Novel Method for Identifying Essential Genes by Fusing Dynamic Protein⁻Protein Interactive Networks. Genes (Basel) 2019; 10:genes10010031. [PMID: 30626157 PMCID: PMC6356314 DOI: 10.3390/genes10010031] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2018] [Revised: 12/24/2018] [Accepted: 01/02/2019] [Indexed: 11/16/2022] Open
Abstract
Essential genes play an indispensable role in supporting the life of an organism. Identification of essential genes helps us to understand the underlying mechanism of cell life. The essential genes of bacteria are potential drug targets of some diseases genes. Recently, several computational methods have been proposed to detect essential genes based on the static protein⁻protein interactive (PPI) networks. However, these methods have ignored the fact that essential genes play essential roles under certain conditions. In this work, a novel method was proposed for the identification of essential proteins by fusing the dynamic PPI networks of different time points (called by FDP). Firstly, the active PPI networks of each time point were constructed and then they were fused into a final network according to the networks' similarities. Finally, a novel centrality method was designed to assign each gene in the final network a ranking score, whilst considering its orthologous property and its global and local topological properties in the network. This model was applied on two different yeast data sets. The results showed that the FDP achieved a better performance in essential gene prediction as compared to other existing methods that are based on the static PPI network or that are based on dynamic networks.
Collapse
Affiliation(s)
- Fengyu Zhang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
- Computer Center of Kunming University of Science and Technology, Kunming 650093, China.
| | - Yunfei Yang
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650093, China.
| | - Junrong Song
- Faculty of Management and Economics, Kunming University of Science and Technology, Kunming 650093, China.
| |
Collapse
|
6
|
Zhu X, Qiu J, Xie M, Wang J. A multi-objective biclustering algorithm based on fuzzy mathematics. Neurocomputing 2017. [DOI: 10.1016/j.neucom.2017.01.095] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
|
7
|
Keretsu S, Sarmah R. Weighted edge based clustering to identify protein complexes in protein-protein interaction networks incorporating gene expression profile. Comput Biol Chem 2016; 65:69-79. [PMID: 27771556 DOI: 10.1016/j.compbiolchem.2016.10.001] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/27/2015] [Revised: 09/19/2016] [Accepted: 10/03/2016] [Indexed: 12/31/2022]
Abstract
Protein complex detection from protein-protein interaction (PPI) network has received a lot of focus in recent years. A number of methods identify protein complexes as dense sub-graphs using network information while several other methods detect protein complexes based on topological information. While the methods based on identifying dense sub-graphs are more effective in identifying protein complexes, not all protein complexes have high density. Moreover, existing methods focus more on static PPI networks and usually overlook the dynamic nature of protein complexes. Here, we propose a new method, Weighted Edge based Clustering (WEC), to identify protein complexes based on the weight of the edge between two interacting proteins, where the weight is defined by the edge clustering coefficient and the gene expression correlation between the interacting proteins. Our WEC method is capable of detecting highly inter-connected and co-expressed protein complexes. The experimental results of WEC on three real life data shows that our method can detect protein complexes effectively in comparison with other highly cited existing methods. AVAILABILITY The WEC tool is available at http://agnigarh.tezu.ernet.in/~rosy8/shared.html.
Collapse
Affiliation(s)
- Seketoulie Keretsu
- North Eastern Regional Institute of Science and Technology, Arunachal Pradesh, India.
| | - Rosy Sarmah
- Tezpur University, Assam, India. http://agnigarh.tezu.ernet.in/~rosy8/index.html
| |
Collapse
|
8
|
Lakizadeh A, Jalili S. BiCAMWI: A Genetic-Based Biclustering Algorithm for Detecting Dynamic Protein Complexes. PLoS One 2016; 11:e0159923. [PMID: 27462706 PMCID: PMC4963120 DOI: 10.1371/journal.pone.0159923] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2016] [Accepted: 07/11/2016] [Indexed: 01/08/2023] Open
Abstract
Considering the roles of protein complexes in many biological processes in the cell, detection of protein complexes from available protein-protein interaction (PPI) networks is a key challenge in the post genome era. Despite high dynamicity of cellular systems and dynamic interaction between proteins in a cell, most computational methods have focused on static networks which cannot represent the inherent dynamicity of protein interactions. Recently, some researchers try to exploit the dynamicity of PPI networks by constructing a set of dynamic PPI subnetworks correspondent to each time-point (column) in a gene expression data. However, many genes can participate in multiple biological processes and cellular processes are not necessarily related to every sample, but they might be relevant only for a subset of samples. So, it is more interesting to explore each subnetwork based on a subset of genes and conditions (i.e., biclusters) in a gene expression data. Here, we present a new method, called BiCAMWI to employ dynamicity in detecting protein complexes. The preprocessing phase of the proposed method is based on a novel genetic algorithm that extracts some sets of genes that are co-regulated under some conditions from input gene expression data. Each extracted gene set is called bicluster. In the detection phase of the proposed method, then, based on the biclusters, some dynamic PPI subnetworks are extracted from input static PPI network. Protein complexes are identified by applying a detection method on each dynamic PPI subnetwork and aggregating the results. Experimental results confirm that BiCAMWI effectively models the dynamicity inherent in static PPI networks and achieves significantly better results than state-of-the-art methods. So, we suggest BiCAMWI as a more reliable method for protein complex detection.
Collapse
Affiliation(s)
- Amir Lakizadeh
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| | - Saeed Jalili
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
9
|
Peng X, Wang J, Huan J, Wu FX. Double-layer clustering method to predict protein complexes based on power-law distribution and protein sublocalization. J Theor Biol 2016; 395:186-193. [DOI: 10.1016/j.jtbi.2016.01.043] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2015] [Revised: 01/08/2016] [Accepted: 01/24/2016] [Indexed: 10/22/2022]
|
10
|
Lakizadeh A, Jalili S, Marashi SA. PCD-GED: Protein complex detection considering PPI dynamics based on time series gene expression data. J Theor Biol 2015; 378:31-8. [DOI: 10.1016/j.jtbi.2015.04.020] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2014] [Revised: 04/14/2015] [Accepted: 04/17/2015] [Indexed: 11/16/2022]
|
11
|
Xiao Q, Wang J, Peng X, Wu FX, Pan Y. Identifying essential proteins from active PPI networks constructed with dynamic gene expression. BMC Genomics 2015; 16 Suppl 3:S1. [PMID: 25707432 PMCID: PMC4331804 DOI: 10.1186/1471-2164-16-s3-s1] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Essential proteins are vitally important for cellular survival and development, and identifying essential proteins is very meaningful research work in the post-genome era. Rapid increase of available protein-protein interaction (PPI) data has made it possible to detect protein essentiality at the network level. A series of centrality measures have been proposed to discover essential proteins based on the PPI networks. However, the PPI data obtained from large scale, high-throughput experiments generally contain false positives. It is insufficient to use original PPI data to identify essential proteins. How to improve the accuracy, has become the focus of identifying essential proteins. In this paper, we proposed a framework for identifying essential proteins from active PPI networks constructed with dynamic gene expression. Firstly, we process the dynamic gene expression profiles by using time-dependent model and time-independent model. Secondly, we construct an active PPI network based on co-expressed genes. Lastly, we apply six classical centrality measures in the active PPI network. For the purpose of comparison, other prediction methods are also performed to identify essential proteins based on the active PPI network. The experimental results on yeast network show that identifying essential proteins based on the active PPI network can improve the performance of centrality measures considerably in terms of the number of identified essential proteins and identification accuracy. At the same time, the results also indicate that most of essential proteins are active.
Collapse
|
12
|
Ou-Yang L, Dai DQ, Li XL, Wu M, Zhang XF, Yang P. Detecting temporal protein complexes from dynamic protein-protein interaction networks. BMC Bioinformatics 2014; 15:335. [PMID: 25282536 PMCID: PMC4288635 DOI: 10.1186/1471-2105-15-335] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2014] [Accepted: 09/23/2014] [Indexed: 12/13/2022] Open
Abstract
Background Proteins dynamically interact with each other to perform their biological functions. The dynamic operations of protein interaction networks (PPI) are also reflected in the dynamic formations of protein complexes. Existing protein complex detection algorithms usually overlook the inherent temporal nature of protein interactions within PPI networks. Systematically analyzing the temporal protein complexes can not only improve the accuracy of protein complex detection, but also strengthen our biological knowledge on the dynamic protein assembly processes for cellular organization. Results In this study, we propose a novel computational method to predict temporal protein complexes. Particularly, we first construct a series of dynamic PPI networks by joint analysis of time-course gene expression data and protein interaction data. Then a Time Smooth Overlapping Complex Detection model (TS-OCD) has been proposed to detect temporal protein complexes from these dynamic PPI networks. TS-OCD can naturally capture the smoothness of networks between consecutive time points and detect overlapping protein complexes at each time point. Finally, a nonnegative matrix factorization based algorithm is introduced to merge those very similar temporal complexes across different time points. Conclusions Extensive experimental results demonstrate the proposed method is very effective in detecting temporal protein complexes than the state-of-the-art complex detection techniques. Electronic supplementary material The online version of this article (doi:10.1186/1471-2105-15-335) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou 510275, China.
| | | | | | | | | |
Collapse
|
13
|
msiDBN: a method of identifying critical proteins in dynamic PPI networks. BIOMED RESEARCH INTERNATIONAL 2014; 2014:138410. [PMID: 24800204 PMCID: PMC3996968 DOI: 10.1155/2014/138410] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2014] [Accepted: 03/09/2014] [Indexed: 01/18/2023]
Abstract
Dynamics of protein-protein interactions (PPIs) reveals the recondite principles of biological processes inside a cell. Shown in a wealth of study, just a small group of proteins, rather than the majority, play more essential roles at crucial points of biological processes. This present work focuses on identifying these critical proteins exhibiting dramatic structural changes in dynamic PPI networks. First, a comprehensive way of modeling the dynamic PPIs is presented which simultaneously analyzes the activity of proteins and assembles the dynamic coregulation correlation between proteins at each time point. Second, a novel method is proposed, named msiDBN, which models a common representation of multiple PPI networks using a deep belief network framework and analyzes the reconstruction errors and the variabilities across the time courses in the biological process. Experiments were implemented on data of yeast cell cycles. We evaluated our network construction method by comparing the functional representations of the derived networks with two other traditional construction methods. The ranking results of critical proteins in msiDBN were compared with the results from the baseline methods. The results of comparison showed that msiDBN had better reconstruction rate and identified more proteins of critical value to yeast cell cycle process.
Collapse
|