1
|
Zhao B, Hu S, Liu X, Xiong H, Han X, Zhang Z, Li X, Wang L. A Novel Computational Approach for Identifying Essential Proteins From Multiplex Biological Networks. Front Genet 2020; 11:343. [PMID: 32373163 PMCID: PMC7186452 DOI: 10.3389/fgene.2020.00343] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2020] [Accepted: 03/23/2020] [Indexed: 11/13/2022] Open
Abstract
The identification of essential proteins can help in understanding the minimum requirements for cell survival and development. Ever-increasing amounts of high-throughput data provide us with opportunities to detect essential proteins from protein interaction networks (PINs). Existing network-based approaches are limited by the poor quality of the underlying PIN data, which exhibits high rates of false positive and false negative results. To overcome this problem, researchers have focused on the prediction of essential proteins by combining PINs with other biological data, which has led to the emergence of various interactions between proteins. It remains challenging, however, to use aggregated multiplex interactions within a single analysis framework to identify essential proteins. In this study, we created a multiplex biological network (MON) by initially integrating PINs, protein domains, and gene expression profiles. Next, we proposed a new approach to discover essential proteins by extending the random walk with restart algorithm to the tensor, which provides a data model representation of the MON. In contrast to existing approaches, the proposed MON approach considers for the importance of nodes and the different types of interactions between proteins during the iteration. MON was implemented to identify essential proteins within two yeast PINs. Our comprehensive experimental results demonstrated that MON outperformed 11 other state-of-the-art approaches in terms of precision-recall curve, jackknife curve, and other criteria.
Collapse
Affiliation(s)
- Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Changsha University, Changsha, China
| | - Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Xiner Liu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Huijun Xiong
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Xiao Han
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, China.,Hunan Provincial Key Laboratory of Industrial Internet Technology and Security, Changsha University, Changsha, China
| |
Collapse
|
2
|
Zhang Z, Luo Y, Hu S, Li X, Wang L, Zhao B. A novel method to predict essential proteins based on tensor and HITS algorithm. Hum Genomics 2020; 14:14. [PMID: 32252824 PMCID: PMC7137323 DOI: 10.1186/s40246-020-00263-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2019] [Accepted: 03/05/2020] [Indexed: 11/13/2022] Open
Abstract
BACKGROUND Essential proteins are an important part of the cell and closely related to the life activities of the cell. Hitherto, Protein-Protein Interaction (PPI) networks have been adopted by many computational methods to predict essential proteins. Most of the current approaches focus mainly on the topological structure of PPI networks. However, those methods relying solely on the PPI network have low detection accuracy for essential proteins. Therefore, it is necessary to integrate the PPI network with other biological information to identify essential proteins. RESULTS In this paper, we proposed a novel random walk method for identifying essential proteins, called HEPT. A three-dimensional tensor is constructed first by combining the PPI network of Saccharomyces cerevisiae with multiple biological data such as gene ontology annotations and protein domains. Then, based on the newly constructed tensor, we extended the Hyperlink-Induced Topic Search (HITS) algorithm from a two-dimensional to a three-dimensional tensor model that can be utilized to infer essential proteins. Different from existing state-of-the-art methods, the importance of proteins and the types of interactions will both contribute to the essential protein prediction. To evaluate the performance of our newly proposed HEPT method, proteins are ranked in the descending order based on their ranking scores computed by our method and other competitive methods. After that, a certain number of the ranked proteins are selected as candidates for essential proteins. According to the list of known essential proteins, the number of true essential proteins is used to judge the performance of each method. Experimental results show that our method can achieve better prediction performance in comparison with other nine state-of-the-art methods in identifying essential proteins. CONCLUSIONS Through analysis and experimental results, it is obvious that HEPT can be used to effectively improve the prediction accuracy of essential proteins by the use of HITS algorithm and the combination of network topology with gene ontology annotations and protein domains, which provides a new insight into multi-data source fusion.
Collapse
Affiliation(s)
- Zhihong Zhang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Yingchun Luo
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
- Department of Ultrasound, Hunan Province Women and Children’s Hospital, Changsha, 410008 China
| | - Sai Hu
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Xueyong Li
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Lei Wang
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
| | - Bihai Zhao
- College of Computer Engineering and Applied Mathematics, Changsha University, Changsha, 410022 China
- Hunan Provincial Key Laboratory of Nutrition and Quality Control of Aquatic Animals, Department of Biological and Environmental Engineering, Changsha University, Changsha, 410022 China
| |
Collapse
|
3
|
Fan T, Hu Y, Xin J, Zhao M, Wang J. Analyzing the genes and pathways related to major depressive disorder via a systems biology approach. Brain Behav 2020; 10:e01502. [PMID: 31875662 PMCID: PMC7010578 DOI: 10.1002/brb3.1502] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 03/26/2019] [Revised: 11/20/2019] [Accepted: 11/26/2019] [Indexed: 12/12/2022] Open
Abstract
INTRODUCTION Major depressive disorder (MDD) is a mental disorder caused by the combination of genetic, environmental, and psychological factors. Over the years, a number of genes potentially associated with MDD have been identified. However, in many cases, the role of these genes and their relationship in the etiology and development of MDD remains unclear. Under such situation, a systems biology approach focusing on the function correlation and interaction of the candidate genes in the context of MDD will provide useful information on exploring the molecular mechanisms underlying the disease. METHODS We collected genes potentially related to MDD by screening the human genetic studies deposited in PubMed (https://www.ncbi.nlm.nih.gov/pubmed). The main biological themes within the genes were explored by function and pathway enrichment analysis. Then, the interaction of genes was analyzed in the context of protein-protein interaction network and a MDD-specific network was built by Steiner minimal tree algorithm. RESULTS We collected 255 candidate genes reported to be associated with MDD from available publications. Functional analysis revealed that biological processes and biochemical pathways related to neuronal development, endocrine, cell growth and/or survivals, and immunology were enriched in these genes. The pathways could be largely grouped into three modules involved in biological procedures related to nervous system, the immune system, and the endocrine system, respectively. From the MDD-specific network, 35 novel genes potentially associated with the disease were identified. CONCLUSION By means of network- and pathway-based methods, we explored the molecular mechanism underlying the pathogenesis of MDD at a systems biology level. Results from our work could provide valuable clues for understanding the molecular features of MDD.
Collapse
Affiliation(s)
- Ting Fan
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Ying Hu
- Academy of Psychology and Behavior, Tianjin Normal University, Tianjin, China
| | - Juncai Xin
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Mengwen Zhao
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| |
Collapse
|
4
|
Inostroza D, Hernández C, Seco D, Navarro G, Olivera-Nappa A. Cell cycle and protein complex dynamics in discovering signaling pathways. J Bioinform Comput Biol 2019; 17:1950011. [PMID: 31230498 DOI: 10.1142/s0219720019500112] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
Signaling pathways are responsible for the regulation of cell processes, such as monitoring the external environment, transmitting information across membranes, and making cell fate decisions. Given the increasing amount of biological data available and the recent discoveries showing that many diseases are related to the disruption of cellular signal transduction cascades, in silico discovery of signaling pathways in cell biology has become an active research topic in past years. However, reconstruction of signaling pathways remains a challenge mainly because of the need for systematic approaches for predicting causal relationships, like edge direction and activation/inhibition among interacting proteins in the signal flow. We propose an approach for predicting signaling pathways that integrates protein interactions, gene expression, phenotypes, and protein complex information. Our method first finds candidate pathways using a directed-edge-based algorithm and then defines a graph model to include causal activation relationships among proteins, in candidate pathways using cell cycle gene expression and phenotypes to infer consistent pathways in yeast. Then, we incorporate protein complex coverage information for deciding on the final predicted signaling pathways. We show that our approach improves the predictive results of the state of the art using different ranking metrics.
Collapse
Affiliation(s)
- Daniel Inostroza
- 1 Computer Science Department, University of Concepción, Edmundo Larenas, Concepción 4030000, Chile
| | - Cecilia Hernández
- 1 Computer Science Department, University of Concepción, Edmundo Larenas, Concepción 4030000, Chile.,2 Center for Biotechnology and Bioengineering (CeBiB), Santiago, Chile
| | - Diego Seco
- 1 Computer Science Department, University of Concepción, Edmundo Larenas, Concepción 4030000, Chile.,3 IMFD - Millennium Institute for Foundational Research on Data, Chile
| | - Gonzalo Navarro
- 4 Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
| | - Alvaro Olivera-Nappa
- 5 Center for Biotechnology and Bioengineering (CeBiB), Department of Chemical Engineering and Biotechnology, University of Chile, Santiago, Chile
| |
Collapse
|
5
|
Liu X, Yang Z, Sang S, Lin H, Wang J, Xu B. Detection of protein complexes from multiple protein interaction networks using graph embedding. Artif Intell Med 2019; 96:107-115. [DOI: 10.1016/j.artmed.2019.04.001] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/22/2018] [Revised: 04/06/2019] [Accepted: 04/06/2019] [Indexed: 12/22/2022]
|
6
|
Liu X, Yang Z, Sang S, Zhou Z, Wang L, Zhang Y, Lin H, Wang J, Xu B. Identifying protein complexes based on node embeddings obtained from protein-protein interaction networks. BMC Bioinformatics 2018; 19:332. [PMID: 30241459 PMCID: PMC6150962 DOI: 10.1186/s12859-018-2364-2] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/08/2018] [Accepted: 09/09/2018] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Protein complexes are one of the keys to deciphering the behavior of a cell system. During the past decade, most computational approaches used to identify protein complexes have been based on discovering densely connected subgraphs in protein-protein interaction (PPI) networks. However, many true complexes are not dense subgraphs and these approaches show limited performances for detecting protein complexes from PPI networks. RESULTS To solve these problems, in this paper we propose a supervised learning method based on network node embeddings which utilizes the informative properties of known complexes to guide the search process for new protein complexes. First, node embeddings are obtained from human protein interaction network. Then the protein interactions are weighted through the similarities between node embeddings. After that, the supervised learning method is used to detect protein complexes. Then the random forest model is used to filter the candidate complexes in order to obtain the final predicted complexes. Experimental results on real human and yeast protein interaction networks show that our method effectively improves the performance for protein complex detection. CONCLUSIONS We provided a new method for identifying protein complexes from human and yeast protein interaction networks, which has great potential to benefit the field of protein complex detection.
Collapse
Affiliation(s)
- Xiaoxia Liu
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Zhihao Yang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China.
| | - Shengtian Sang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Ziwei Zhou
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Lei Wang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850, People's Republic of China.
| | - Yin Zhang
- Beijing Institute of Health Administration and Medical Information, Beijing, 100850, People's Republic of China
| | - Hongfei Lin
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Jian Wang
- College of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| | - Bo Xu
- School of Software Technology, Dalian University of Technology, Dalian, 116024, Liaoning, People's Republic of China
| |
Collapse
|
7
|
Abstract
Cellular functions are often performed by multiprotein structures called protein complexes. These complexes are dynamic structures that evolve during the cell cycle or in response to external and internal stimuli, and are tightly regulated by protein expression in different tissues resulting in quantitative and qualitative variation of protein complexes. Advances in high-throughput techniques, such as mass-spectrometry and yeast two-hybrid provided a large amount of data on protein-protein interactions. This sparked the development of computational methods able to predict protein complex formation under a variety of biological and clinical conditions. However, the challenges that need to be addressed for successful computational protein complex prediction are highly complex.The post-genomic era saw an emerging number of algorithms and software, which are able to predict protein complexes from protein-protein interaction networks and a variety of other sources. Despite the high capacity of these methods to qualitatively predict protein complexes, they could provide only limited or no quantitative information of the predicted complexes. Recently, a new large-scale simulation of protein complexes was able to achieve this task by simulating protein complex formation on the proteome scale.In this chapter, we review representative methods that can predict multiple protein complexes at different scales and discuss how these can be combined with emerging sources of data in order to improve protein complex characterization.
Collapse
|
8
|
Hernandez C, Mella C, Navarro G, Olivera-Nappa A, Araya J. Protein complex prediction via dense subgraphs and false positive analysis. PLoS One 2017; 12:e0183460. [PMID: 28937982 PMCID: PMC5609739 DOI: 10.1371/journal.pone.0183460] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 08/04/2017] [Indexed: 01/04/2023] Open
Abstract
Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
Collapse
Affiliation(s)
- Cecilia Hernandez
- Computer Science, University of Concepción, Concepción, Chile
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
- * E-mail:
| | - Carlos Mella
- Computer Science, University of Concepción, Concepción, Chile
| | - Gonzalo Navarro
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
| | - Alvaro Olivera-Nappa
- Center for Biotechnology and Bioengineering (CeBiB), Department of Chemical Engineering and Biotechnology, University of Chile, Santiago, Chile
| | - Jaime Araya
- Computer Science, University of Concepción, Concepción, Chile
| |
Collapse
|
9
|
Li M, Li D, Tang Y, Wu F, Wang J. CytoCluster: A Cytoscape Plugin for Cluster Analysis and Visualization of Biological Networks. Int J Mol Sci 2017; 18:ijms18091880. [PMID: 28858211 PMCID: PMC5618529 DOI: 10.3390/ijms18091880] [Citation(s) in RCA: 58] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2017] [Revised: 08/22/2017] [Accepted: 08/23/2017] [Indexed: 12/15/2022] Open
Abstract
Nowadays, cluster analysis of biological networks has become one of the most important approaches to identifying functional modules as well as predicting protein complexes and network biomarkers. Furthermore, the visualization of clustering results is crucial to display the structure of biological networks. Here we present CytoCluster, a cytoscape plugin integrating six clustering algorithms, HC-PIN (Hierarchical Clustering algorithm in Protein Interaction Networks), OH-PIN (identifying Overlapping and Hierarchical modules in Protein Interaction Networks), IPCA (Identifying Protein Complex Algorithm), ClusterONE (Clustering with Overlapping Neighborhood Expansion), DCU (Detecting Complexes based on Uncertain graph model), IPC-MCE (Identifying Protein Complexes based on Maximal Complex Extension), and BinGO (the Biological networks Gene Ontology) function. Users can select different clustering algorithms according to their requirements. The main function of these six clustering algorithms is to detect protein complexes or functional modules. In addition, BinGO is used to determine which Gene Ontology (GO) categories are statistically overrepresented in a set of genes or a subgraph of a biological network. CytoCluster can be easily expanded, so that more clustering algorithms and functions can be added to this plugin. Since it was created in July 2013, CytoCluster has been downloaded more than 9700 times in the Cytoscape App store and has already been applied to the analysis of different biological networks. CytoCluster is available from http://apps.cytoscape.org/apps/cytocluster.
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | - Dongyan Li
- School of software, Central South University, Changsha 410083, China.
| | - Yu Tang
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | - Fangxiang Wu
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
- Department of Mechanical Engineering and Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|