1
|
Morand J, Yip S, Velegrakis Y, Lattanzi G, Potestio R, Tubiana L. Quality assessment and community detection methods for anonymized mobility data in the Italian Covid context. Sci Rep 2024; 14:4636. [PMID: 38409411 PMCID: PMC10897296 DOI: 10.1038/s41598-024-54878-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/23/2023] [Accepted: 02/17/2024] [Indexed: 02/28/2024] Open
Abstract
We discuss how to assess the reliability of partial, anonymized mobility data and compare two different methods to identify spatial communities based on movements: Greedy Modularity Clustering (GMC) and the novel Critical Variable Selection (CVS). These capture different aspects of mobility: direct population fluxes (GMC) and the probability for individuals to move between two nodes (CVS). As a test case, we consider movements of Italians before and during the SARS-Cov2 pandemic, using Facebook users' data and publicly available information from the Italian National Institute of Statistics (Istat) to construct daily mobility networks at the interprovincial level. Using the Perron-Frobenius (PF) theorem, we show how the mean stochastic network has a stationary population density state comparable with data from Istat, and how this ceases to be the case if even a moderate amount of pruning is applied to the network. We then identify the first two national lockdowns through temporal clustering of the mobility networks, define two representative graphs for the lockdown and non-lockdown conditions and perform optimal spatial community identification on both graphs using the GMC and CVS approaches. Despite the fundamental differences in the methods, the variation of information (VI) between them assesses that they return similar partitions of the Italian provincial networks in both situations. The information provided can be used to inform policy, for example, to define an optimal scale for lockdown measures. Our approach is general and can be applied to other countries or geographical scales.
Collapse
Affiliation(s)
- Jules Morand
- University of Trento, via Sommarive 14, 38123, Trento, Italy.
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, 38123, Trento, Italy.
| | - Shoichi Yip
- University of Trento, via Sommarive 14, 38123, Trento, Italy
| | - Yannis Velegrakis
- University of Trento, via Sommarive 14, 38123, Trento, Italy
- Utrecht University, Princetonplein 5, 3584 CC, Utrecht, The Netherlands
| | - Gianluca Lattanzi
- University of Trento, via Sommarive 14, 38123, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, 38123, Trento, Italy
| | - Raffaello Potestio
- University of Trento, via Sommarive 14, 38123, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, 38123, Trento, Italy
| | - Luca Tubiana
- University of Trento, via Sommarive 14, 38123, Trento, Italy
- INFN-TIFPA, Trento Institute for Fundamental Physics and Applications, 38123, Trento, Italy
| |
Collapse
|
2
|
Zhan Y, Liu J, Wu M, Tan CSH, Li X, Ou-Yang L. A partially shared joint clustering framework for detecting protein complexes from multiple state-specific signed interaction networks. Comput Biol Med 2023; 159:106936. [PMID: 37105110 DOI: 10.1016/j.compbiomed.2023.106936] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2022] [Revised: 03/27/2023] [Accepted: 04/13/2023] [Indexed: 04/29/2023]
Abstract
Detecting protein complexes is critical for studying cellular organizations and functions. The accumulation of protein-protein interaction (PPI) data enables the identification of protein complexes computationally. Although a great number of computational methods have been proposed to identify protein complexes from PPI networks, most of them ignore the signs of PPIs that reflect the ways proteins interact (activation or inhibition). As not all PPIs imply co-complex relationships, taking into account the signs of PPIs can benefit the identification of protein complexes. Moreover, PPI networks are not static, but vary with the change of cell states or environments. However, existing methods are primarily designed for single-network clustering, and rarely consider joint clustering of multiple PPI networks. In this study, we propose a novel partially shared signed network clustering (PS-SNC) model for identifying protein complexes from multiple state-specific signed PPI networks jointly. PS-SNC can not only consider the signs of PPIs, but also identify the common and unique protein complexes in different states. Experimental results on synthetic and real datasets show that our PS-SNC model can achieve better performance than other state-of-the-art protein complex detection methods. Extensive analysis on real datasets demonstrate the effectiveness of PS-SNC in revealing novel insights about the underlying patterns of different cell lines.
Collapse
Affiliation(s)
- Youlin Zhan
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Jiahan Liu
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Min Wu
- Institute for Infocomm Research (I2R), Agency of Science, Technology, and Research (A*STAR), 138632, Singapore
| | - Chris Soon Heng Tan
- Department of Chemistry, College of Science, Southern University of Science and Technology, Shenzhen, 518055, China
| | - Xiaoli Li
- Institute for Infocomm Research (I2R), Agency of Science, Technology, and Research (A*STAR), 138632, Singapore
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China; Shenzhen Institute of Artificial Intelligence and Robotics for Society, Shenzhen, 518129, China.
| |
Collapse
|
3
|
Recognition of Protein Network for Bioinformatics Knowledge Analysis Using Support Vector Machine. BIOMED RESEARCH INTERNATIONAL 2022; 2022:2273648. [PMID: 35502337 PMCID: PMC9056223 DOI: 10.1155/2022/2273648] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 02/07/2022] [Revised: 03/22/2022] [Accepted: 03/29/2022] [Indexed: 11/18/2022]
Abstract
Protein is the material foundation of living things, and it directly takes part in and runs the process of living things itself. Predicting protein complexes helps us understand the structure and function of complexes, and it is an important foundation for studying how cells work. Genome-wide protein interaction (PPI) data is growing as high-throughput experiments become more common. The aim of this research is that it provides a dual-tree complex wavelet transform which is used to find out about the structure of proteins. It also identifies the secondary structure of protein network. Many computer-based methods for predicting protein complexes have also been developed in the field. Identifying the secondary structure of a protein is very important when you are studying protein characteristics and properties. This is how the protein sequence is added to the distance matrix. The scope of this research is that it can confidently predict certain protein complexes rapidly, which compensates for shortcomings in biological research. The three-dimensional coordinates of C atom are used to do this. According to the texture information in the distance matrix, the matrix is broken down into four levels by the double-tree complex wavelet transform because it has four levels. The subband energy and standard deviation in different directions are taken, and then, the two-dimensional feature vector is used to show the secondary structure features of the protein in a way that is easy to understand. Then, the KNN and SVM classifiers are used to classify the features that were found. Experiments show that a new feature called a dual-tree complex wavelet can improve the texture granularity and directionality of the traditional feature extraction method, which is called secondary structure.
Collapse
|
4
|
Wu Z, Liao Q, Fan S, Liu B. idenPC-CAP: Identify protein complexes from weighted RNA-protein heterogeneous interaction networks using co-assemble partner relation. Brief Bioinform 2020; 22:6041167. [PMID: 33333549 DOI: 10.1093/bib/bbaa372] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2020] [Revised: 11/07/2020] [Accepted: 11/20/2020] [Indexed: 12/18/2022] Open
Abstract
Protein complexes play important roles in most cellular processes. The available genome-wide protein-protein interaction (PPI) data make it possible for computational methods identifying protein complexes from PPI networks. However, PPI datasets usually contain a large ratio of false positive noise. Moreover, different types of biomolecules in a living cell cooperate to form a union interaction network. Because previous computational methods focus only on PPIs ignoring other types of biomolecule interactions, their predicted protein complexes often contain many false positive proteins. In this study, we develop a novel computational method idenPC-CAP to identify protein complexes from the RNA-protein heterogeneous interaction network consisting of RNA-RNA interactions, RNA-protein interactions and PPIs. By considering interactions among proteins and RNAs, the new method reduces the ratio of false positive proteins in predicted protein complexes. The experimental results demonstrate that idenPC-CAP outperforms the other state-of-the-art methods in this field.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Shixi Fan
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| |
Collapse
|
5
|
Wu Z, Liao Q, Liu B. idenPC-MIIP: identify protein complexes from weighted PPI networks using mutual important interacting partner relation. Brief Bioinform 2020; 22:1972-1983. [PMID: 32065215 DOI: 10.1093/bib/bbaa016] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2019] [Revised: 01/15/2020] [Accepted: 01/27/2020] [Indexed: 12/28/2022] Open
Abstract
Protein complexes are key units for studying a cell system. During the past decades, the genome-scale protein-protein interaction (PPI) data have been determined by high-throughput approaches, which enables the identification of protein complexes from PPI networks. However, the high-throughput approaches often produce considerable fraction of false positive and negative samples. In this study, we propose the mutual important interacting partner relation to reflect the co-complex relationship of two proteins based on their interaction neighborhoods. In addition, a new algorithm called idenPC-MIIP is developed to identify protein complexes from weighted PPI networks. The experimental results on two widely used datasets show that idenPC-MIIP outperforms 17 state-of-the-art methods, especially for identification of small protein complexes with only two or three proteins.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China, and School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
6
|
Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein–protein interaction networks. Brief Bioinform 2019; 21:1531-1548. [DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 02/04/2023] Open
Abstract
Abstract
Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein–protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
7
|
Moth–flame optimization-based algorithm with synthetic dynamic PPI networks for discovering protein complexes. Knowl Based Syst 2019. [DOI: 10.1016/j.knosys.2019.02.011] [Citation(s) in RCA: 36] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
|
8
|
Liu W, Ma L, Jeon B, Chen L, Chen B. A Network Hierarchy-Based method for functional module detection in protein-protein interaction networks. J Theor Biol 2018; 455:26-38. [PMID: 29981337 DOI: 10.1016/j.jtbi.2018.06.026] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2018] [Revised: 06/27/2018] [Accepted: 06/29/2018] [Indexed: 02/02/2023]
Abstract
In the post-genomic era, one of the important tasks is to identify protein complexes and functional modules from high-throughput protein-protein interaction data, so that we can systematically analyze and understand the molecular functions and biological processes of cells. Although a lot of functional module detection studies have been proposed, how to design correctly and efficiently functional modules detection algorithms is still a challenging and important scientific problem in computational biology. In this paper, we present a novel Network Hierarchy-Based method to detect functional modules in PPI networks (named NHB-FMD). NHB-FMD first constructs the hierarchy tree corresponding to the PPI network and then encodes the tree such that genetic algorithm is employed to obtain the hierarchy tree with Maximum Likelihood. After that functional module partitioning is performed based on it and the best partitioning is selected as the result. Experimental results in the real PPI networks have shown that the proposed algorithm not only significantly outperforms the state-of-the-art methods but also can detect protein modules more effectively and accurately.
Collapse
Affiliation(s)
- Wei Liu
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China; The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China; School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea.
| | - Liangyu Ma
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Byeungwoo Jeon
- School of Electronic and Electrical Engineering, Sungkyunkwan University, Suwon, South Korea
| | - Ling Chen
- College of Information Engineering of Yangzhou University, Yangzhou 225127, China
| | - Bolun Chen
- The Laboratory for Internfet of Things and Mobile Internet Technology of Jiangsu Province, Huaiyin Institute of Technology, Huaiyin 223002, China
| |
Collapse
|
9
|
CPredictor3.0: detecting protein complexes from PPI networks with expression data and functional annotations. BMC SYSTEMS BIOLOGY 2017; 11:135. [PMID: 29322927 PMCID: PMC5763309 DOI: 10.1186/s12918-017-0504-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
BACKGROUND Effectively predicting protein complexes not only helps to understand the structures and functions of proteins and their complexes, but also is useful for diagnosing disease and developing new drugs. Up to now, many methods have been developed to detect complexes by mining dense subgraphs from static protein-protein interaction (PPI) networks, while ignoring the value of other biological information and the dynamic properties of cellular systems. RESULTS In this paper, based on our previous works CPredictor and CPredictor2.0, we present a new method for predicting complexes from PPI networks with both gene expression data and protein functional annotations, which is called CPredictor3.0. This new method follows the viewpoint that proteins in the same complex should roughly have similar functions and are active at the same time and place in cellular systems. We first detect active proteins by using gene express data of different time points and cluster proteins by using gene ontology (GO) functional annotations, respectively. Then, for each time point, we do set intersections with one set corresponding to active proteins generated from expression data and the other set corresponding to a protein cluster generated from functional annotations. Each resulting unique set indicates a cluster of proteins that have similar function(s) and are active at that time point. Following that, we map each cluster of active proteins of similar function onto a static PPI network, and get a series of induced connected subgraphs. We treat these subgraphs as candidate complexes. Finally, by expanding and merging these candidate complexes, the predicted complexes are obtained. We evaluate CPredictor3.0 and compare it with a number of existing methods on several PPI networks and benchmarking complex datasets. The experimental results show that CPredictor3.0 achieves the highest F1-measure, which indicates that CPredictor3.0 outperforms these existing method in overall. CONCLUSION CPredictor3.0 can serve as a promising tool of protein complex prediction.
Collapse
|
10
|
Ou-Yang L, Yan H, Zhang XF. A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks. BMC Bioinformatics 2017; 18:463. [PMID: 29219066 PMCID: PMC5773919 DOI: 10.1186/s12859-017-1877-4] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 08/24/2023] Open
Abstract
Background The accurate identification of protein complexes is important for the understanding of cellular organization. Up to now, computational methods for protein complex detection are mostly focus on mining clusters from protein-protein interaction (PPI) networks. However, PPI data collected by high-throughput experimental techniques are known to be quite noisy. It is hard to achieve reliable prediction results by simply applying computational methods on PPI data. Behind protein interactions, there are protein domains that interact with each other. Therefore, based on domain-protein associations, the joint analysis of PPIs and domain-domain interactions (DDI) has the potential to obtain better performance in protein complex detection. As traditional computational methods are designed to detect protein complexes from a single PPI network, it is necessary to design a new algorithm that could effectively utilize the information inherent in multiple heterogeneous networks. Results In this paper, we introduce a novel multi-network clustering algorithm to detect protein complexes from multiple heterogeneous networks. Unlike existing protein complex identification algorithms that focus on the analysis of a single PPI network, our model can jointly exploit the information inherent in PPI and DDI data to achieve more reliable prediction results. Extensive experiment results on real-world data sets demonstrate that our method can predict protein complexes more accurately than other state-of-the-art protein complex identification algorithms. Conclusions In this work, we demonstrate that the joint analysis of PPI network and DDI network can help to improve the accuracy of protein complex detection.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering & Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China
| | - Hong Yan
- College of Information Engineering & Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China.,Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China.
| |
Collapse
|
11
|
Characterization of 2-Path Product Signed Graphs with Its Properties. COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE 2017; 2017:1235715. [PMID: 28761437 PMCID: PMC5518524 DOI: 10.1155/2017/1235715] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/09/2017] [Accepted: 05/22/2017] [Indexed: 11/29/2022]
Abstract
A signed graph is a simple graph where each edge receives a sign positive or negative. Such graphs are mainly used in social sciences where individuals represent vertices friendly relation between them as a positive edge and enmity as a negative edge. In signed graphs, we define these relationships (edges) as of friendship (“+” edge) or hostility (“−” edge). A 2-path product signed graph S#^S of a signed graph S is defined as follows: the vertex set is the same as S and two vertices are adjacent if and only if there exists a path of length two between them in S. The sign of an edge is the product of marks of vertices in S where the mark of vertex u in S is the product of signs of all edges incident to the vertex. In this paper, we give a characterization of 2-path product signed graphs. Also, some other properties such as sign-compatibility and canonically-sign-compatibility of 2-path product signed graphs are discussed along with isomorphism and switching equivalence of this signed graph with 2-path signed graph.
Collapse
|
12
|
Ou-Yang L, Zhang XF, Dai DQ, Wu MY, Zhu Y, Liu Z, Yan H. Protein complex detection based on partially shared multi-view clustering. BMC Bioinformatics 2016; 17:371. [PMID: 27623844 PMCID: PMC5022186 DOI: 10.1186/s12859-016-1164-9] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2015] [Accepted: 07/23/2016] [Indexed: 01/05/2023] Open
Abstract
Background Protein complexes are the key molecular entities to perform many essential biological functions. In recent years, high-throughput experimental techniques have generated a large amount of protein interaction data. As a consequence, computational analysis of such data for protein complex detection has received increased attention in the literature. However, most existing works focus on predicting protein complexes from a single type of data, either physical interaction data or co-complex interaction data. These two types of data provide compatible and complementary information, so it is necessary to integrate them to discover the underlying structures and obtain better performance in complex detection. Results In this study, we propose a novel multi-view clustering algorithm, called the Partially Shared Multi-View Clustering model (PSMVC), to carry out such an integrated analysis. Unlike traditional multi-view learning algorithms that focus on mining either consistent or complementary information embedded in the multi-view data, PSMVC can jointly explore the shared and specific information inherent in different views. In our experiments, we compare the complexes detected by PSMVC from single data source with those detected from multiple data sources. We observe that jointly analyzing multi-view data benefits the detection of protein complexes. Furthermore, extensive experiment results demonstrate that PSMVC performs much better than 16 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques. Conclusions In this work, we demonstrate that when integrating multiple data sources, using partially shared multi-view clustering model can help to identify protein complexes which are not readily identifiable by conventional single-view-based methods and other integrative analysis methods. All the results and source codes are available on https://github.com/Oyl-CityU/PSMVC. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1164-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering, Shenzhen University, Nanhai Ave 3688, Shenzhen, 518060, China.,Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics and Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Xin Gang Road West, Guangzhou, 510275, China.
| | - Meng-Yun Wu
- School of Statistics and Management, Shanghai University of Finance and Economics, Guoding Road, Shanghai, 200433, China
| | - Yuan Zhu
- School of Automation, China University of Geosciences, Wuhan, China
| | - Zhiyong Liu
- Shenzhen Polytechnic, Shenzhen, 518055, China
| | - Hong Yan
- Department of Electronic and Engineering, City University of Hong Kong, Tat Chee Avenue, Hong Kong, China
| |
Collapse
|
13
|
Shen X, Yi L, Jiang X, Zhao Y, Hu X, He T, Yang J. Neighbor affinity based algorithm for discovering temporal protein complex from dynamic PPI network. Methods 2016; 110:90-96. [PMID: 27320204 DOI: 10.1016/j.ymeth.2016.06.010] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Revised: 05/31/2016] [Accepted: 06/14/2016] [Indexed: 12/13/2022] Open
Abstract
Detection of temporal protein complexes would be a great aid in furthering our knowledge of the dynamic features and molecular mechanism in cell life activities. Most existing clustering algorithms for discovering protein complexes are based on static protein interaction networks in which the inherent dynamics are often overlooked. We propose a novel algorithm DPC-NADPIN (Discovering Protein Complexes based on Neighbor Affinity and Dynamic Protein Interaction Network) to identify temporal protein complexes from the time course protein interaction networks. Inspired by the idea of that the tighter a protein's neighbors inside a module connect, the greater the possibility that the protein belongs to the module, DPC-NADPIN algorithm first chooses each of the proteins with high clustering coefficient and its neighbors to consolidate into an initial cluster, and then the initial cluster becomes a protein complex by appending its neighbor proteins according to the relationship between the affinity among neighbors inside the cluster and that outside the cluster. In our experiments, DPC-NADPIN algorithm is proved to be reasonable and it has better performance on discovering protein complexes than the following state-of-the-art algorithms: Hunter, MCODE, CFinder, SPICI, and ClusterONE; Meanwhile, it obtains many protein complexes with strong biological significance, which provide helpful biological knowledge to the related researchers. Moreover, we find that proteins are assembled coordinately to form protein complexes with characteristics of temporality and spatiality, thereby performing specific biological functions.
Collapse
Affiliation(s)
- Xianjun Shen
- School of Computer, Central China Normal University, Wuhan, China.
| | - Li Yi
- School of Computer, Central China Normal University, Wuhan, China.
| | - Xingpeng Jiang
- School of Computer, Central China Normal University, Wuhan, China.
| | - Yanli Zhao
- School of Computer, Central China Normal University, Wuhan, China.
| | - Xiaohua Hu
- School of Computer, Central China Normal University, Wuhan, China; College of Computing and Informatics, Drexel University, Philadelphia, USA.
| | - Tingting He
- School of Computer, Central China Normal University, Wuhan, China.
| | - Jincai Yang
- School of Computer, Central China Normal University, Wuhan, China.
| |
Collapse
|
14
|
Ou-Yang L, Wu M, Zhang XF, Dai DQ, Li XL, Yan H. A two-layer integration framework for protein complex detection. BMC Bioinformatics 2016; 17:100. [PMID: 26911324 PMCID: PMC4765032 DOI: 10.1186/s12859-016-0939-3] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2015] [Accepted: 01/27/2016] [Indexed: 01/05/2023] Open
Abstract
Background Protein complexes carry out nearly all signaling and functional processes within cells. The study of protein complexes is an effective strategy to analyze cellular functions and biological processes. With the increasing availability of proteomics data, various computational methods have recently been developed to predict protein complexes. However, different computational methods are based on their own assumptions and designed to work on different data sources, and various biological screening methods have their unique experiment conditions, and are often different in scale and noise level. Therefore, a single computational method on a specific data source is generally not able to generate comprehensive and reliable prediction results. Results In this paper, we develop a novel Two-layer INtegrative Complex Detection (TINCD) model to detect protein complexes, leveraging the information from both clustering results and raw data sources. In particular, we first integrate various clustering results to construct consensus matrices for proteins to measure their overall co-complex propensity. Second, we combine these consensus matrices with the co-complex score matrix derived from Tandem Affinity Purification/Mass Spectrometry (TAP) data and obtain an integrated co-complex similarity network via an unsupervised metric fusion method. Finally, a novel graph regularized doubly stochastic matrix decomposition model is proposed to detect overlapping protein complexes from the integrated similarity network. Conclusions Extensive experimental results demonstrate that TINCD performs much better than 21 state-of-the-art complex detection techniques, including ensemble clustering and data integration techniques. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0939-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Le Ou-Yang
- College of Information Engineering, Shenzhen University, Shenzhen, 518060, China. .,Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China. .,Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China.
| | - Min Wu
- Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore, Singapore.
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China.
| | - Dao-Qing Dai
- Intelligent Data Center and Department of Mathematics, Sun Yat-Sen University, Guangzhou, 510275, China.
| | - Xiao-Li Li
- Institute for Infocomm Research (I2R), A*STAR, 1 Fusionopolis Way, Singapore, Singapore.
| | - Hong Yan
- Department of Electronic Engineering, City University of Hong Kong, Hong Kong, China.
| |
Collapse
|