1
|
Ding Q, Yang W, Xue G, Liu H, Cai Y, Que J, Jin X, Luo M, Pang F, Yang Y, Lin Y, Liu Y, Sun H, Tan R, Wang P, Xu Z, Jiang Q. Dimension reduction, cell clustering, and cell-cell communication inference for single-cell transcriptomics with DcjComm. Genome Biol 2024; 25:241. [PMID: 39252099 PMCID: PMC11382422 DOI: 10.1186/s13059-024-03385-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2024] [Accepted: 08/30/2024] [Indexed: 09/11/2024] Open
Abstract
Advances in single-cell transcriptomics provide an unprecedented opportunity to explore complex biological processes. However, computational methods for analyzing single-cell transcriptomics still have room for improvement especially in dimension reduction, cell clustering, and cell-cell communication inference. Herein, we propose a versatile method, named DcjComm, for comprehensive analysis of single-cell transcriptomics. DcjComm detects functional modules to explore expression patterns and performs dimension reduction and clustering to discover cellular identities by the non-negative matrix factorization-based joint learning model. DcjComm then infers cell-cell communication by integrating ligand-receptor pairs, transcription factors, and target genes. DcjComm demonstrates superior performance compared to state-of-the-art methods.
Collapse
Affiliation(s)
- Qian Ding
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Wenyi Yang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Guangfu Xue
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Hongxin Liu
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Yideng Cai
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Jinhao Que
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Xiyun Jin
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Meng Luo
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Fenglan Pang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Yuexin Yang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China
| | - Yi Lin
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Yusong Liu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Haoxiu Sun
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Renjie Tan
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China
| | - Pingping Wang
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China.
| | - Zhaochun Xu
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China.
| | - Qinghua Jiang
- Center for Bioinformatics, School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150000, China.
- School of Interdisciplinary Medicine and Engineering, Harbin Medical University, Harbin, 150076, China.
- State Key Laboratory of Frigid Zone Cardiovascular Diseases (SKLFZCD), Harbin Medical University, Harbin, 150076, China.
| |
Collapse
|
2
|
Conway J, Pouryahya M, Gindin Y, Pan DZ, Carrasco-Zevallos OM, Mountain V, Subramanian GM, Montalto MC, Resnick M, Beck AH, Huss RS, Myers RP, Taylor-Weiner A, Wapinski I, Chung C. Integration of deep learning-based histopathology and transcriptomics reveals key genes associated with fibrogenesis in patients with advanced NASH. Cell Rep Med 2023; 4:101016. [PMID: 37075704 PMCID: PMC10140650 DOI: 10.1016/j.xcrm.2023.101016] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2022] [Revised: 12/31/2022] [Accepted: 03/21/2023] [Indexed: 04/21/2023]
Abstract
Nonalcoholic steatohepatitis (NASH) is the most common chronic liver disease globally and a leading cause for liver transplantation in the US. Its pathogenesis remains imprecisely defined. We combined two high-resolution modalities to tissue samples from NASH clinical trials, machine learning (ML)-based quantification of histological features and transcriptomics, to identify genes that are associated with disease progression and clinical events. A histopathology-driven 5-gene expression signature predicted disease progression and clinical events in patients with NASH with F3 (pre-cirrhotic) and F4 (cirrhotic) fibrosis. Notably, the Notch signaling pathway and genes implicated in liver-related diseases were enriched in this expression signature. In a validation cohort where pharmacologic intervention improved disease histology, multiple Notch signaling components were suppressed.
Collapse
|
3
|
Xiao G, Guan R, Cao Y, Huang Z, Xu Y. KISL: knowledge-injected semi-supervised learning for biological co-expression network modules. Front Genet 2023; 14:1151962. [PMID: 37205122 PMCID: PMC10185879 DOI: 10.3389/fgene.2023.1151962] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2023] [Accepted: 04/11/2023] [Indexed: 05/21/2023] Open
Abstract
The exploration of important biomarkers associated with cancer development is crucial for diagnosing cancer, designing therapeutic interventions, and predicting prognoses. The analysis of gene co-expression provides a systemic perspective on gene networks and can be a valuable tool for mining biomarkers. The main objective of co-expression network analysis is to discover highly synergistic sets of genes, and the most widely used method is weighted gene co-expression network analysis (WGCNA). With the Pearson correlation coefficient, WGCNA measures gene correlation, and uses hierarchical clustering to identify gene modules. The Pearson correlation coefficient reflects only the linear dependence between variables, and the main drawback of hierarchical clustering is that once two objects are clustered together, the process cannot be reversed. Hence, readjusting inappropriate cluster divisions is not possible. Existing co-expression network analysis methods rely on unsupervised methods that do not utilize prior biological knowledge for module delineation. Here we present a method for identification of outstanding modules in a co-expression network using a knowledge-injected semi-supervised learning approach (KISL), which utilizes apriori biological knowledge and a semi-supervised clustering method to address the issue existing in the current GCN-based clustering methods. To measure the linear and non-linear dependence between genes, we introduce a distance correlation due to the complexity of the gene-gene relationship. Eight RNA-seq datasets of cancer samples are used to validate its effectiveness. In all eight datasets, the KISL algorithm outperformed WGCNA when comparing the silhouette coefficient, Calinski-Harabasz index and Davies-Bouldin index evaluation metrics. According to the results, KISL clusters had better cluster evaluation values and better gene module aggregation. Enrichment analysis of the recognition modules demonstrated their effectiveness in discovering modular structures in biological co-expression networks. In addition, as a general method, KISL can be applied to various co-expression network analyses based on similarity metrics. Source codes for the KISL and the related scripts are available online at https://github.com/Mowonhoo/KISL.git.
Collapse
Affiliation(s)
- Gangyi Xiao
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Renchu Guan
- College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yangkun Cao
- School of Artificial Intelligence Jilin University, Changchun, China
| | - Zhenyu Huang
- College of Computer Science and Technology, Jilin University, Changchun, China
- *Correspondence: Ying Xu, ; Zhenyu Huang,
| | - Ying Xu
- School of Medicine, Southern University of Science and Technology, Shenzhen, Guangdong, China
- *Correspondence: Ying Xu, ; Zhenyu Huang,
| |
Collapse
|
4
|
Manipur I, Giordano M, Piccirillo M, Parashuraman S, Maddalena L. Community Detection in Protein-Protein Interaction Networks and Applications. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:217-237. [PMID: 34951849 DOI: 10.1109/tcbb.2021.3138142] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The ability to identify and characterize not only the protein-protein interactions but also their internal modular organization through network analysis is fundamental for understanding the mechanisms of biological processes at the molecular level. Indeed, the detection of the network communities can enhance our understanding of the molecular basis of disease pathology, and promote drug discovery and disease treatment in personalized medicine. This work gives an overview of recent computational methods for the detection of protein complexes and functional modules in protein-protein interaction networks, also providing a focus on some of its applications. We propose a systematic reformulation of frequently adopted taxonomies for these methods, also proposing new categories to keep up with the most recent research. We review the literature of the last five years (2017-2021) and provide links to existing data and software resources. Finally, we survey recent works exploiting module identification and analysis, in the context of a variety of disease processes for biomarker identification and therapeutic target detection. Our review provides the interested reader with an up-to-date and self-contained view of the existing research, with links to state-of-the-art literature and resources, as well as hints on open issues and future research directions in complex detection and its applications.
Collapse
|
5
|
Sahoo TR, Vipsita S, Patra S. Complex Prediction in Large PPI Networks Using Expansion and Stripe of Core Cliques. Interdiscip Sci 2022:10.1007/s12539-022-00541-z. [PMID: 36306022 DOI: 10.1007/s12539-022-00541-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2022] [Revised: 10/06/2022] [Accepted: 10/07/2022] [Indexed: 06/16/2023]
Abstract
The widespread availability and importance of large-scale protein-protein interaction (PPI) data demand a flurry of research efforts to understand the organisation of a cell and its functionality by analysing these data at the network level. In the bioinformatics and data mining fields, network clustering acquired a lot of attraction to examine a PPI network's topological and functional aspects. The clustering of PPI networks has been proven to be an excellent method for discovering functional modules, disclosing functions of unknown proteins, and other tasks in numerous research over the last decade. This research proposes a unique graph mining approach to detect protein complexes using dense neighbourhoods (highly connected regions) in an interaction graph. Our technique first finds size-3 cliques associated with each edge (protein interaction), and then these core cliques are expanded to form high-density subgraphs. Loosely connected proteins are stripped out from these subgraphs to produce a potential protein complex. Finally, the redundancy is removed based on the Jaccard coefficient. Computational results are presented on the yeast and human protein interaction dataset to highlight our proposed technique's efficiency. Predicted protein complexes of the proposed approach have a significantly higher score of similarity to those used as gold standards in the CYC-2008 and CORUM benchmark databases than other existing approaches.
Collapse
Affiliation(s)
| | - Swati Vipsita
- CSE, IIIT Bhubaneswar, Gothapatna, Bhubaneswar, Odisha, 751003, India
| | - Sabyasachi Patra
- CSE, IIIT Bhubaneswar, Gothapatna, Bhubaneswar, Odisha, 751003, India
| |
Collapse
|
6
|
Gong J, Peng Y, Yu J, Pei W, Zhang Z, Fan D, Liu L, Xiao X, Liu R, Lu Q, Li P, Shang H, Shi Y, Li J, Ge Q, Liu A, Deng X, Fan S, Pan J, Chen Q, Yuan Y, Gong W. Linkage and association analyses reveal that hub genes in energy-flow and lipid biosynthesis pathways form a cluster in upland cotton. Comput Struct Biotechnol J 2022; 20:1841-1859. [PMID: 35521543 PMCID: PMC9046884 DOI: 10.1016/j.csbj.2022.04.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 04/11/2022] [Accepted: 04/11/2022] [Indexed: 11/25/2022] Open
Abstract
Upland cotton is an important allotetraploid crop that provides both natural fiber for the textile industry and edible vegetable oil for the food or feed industry. To better understand the genetic mechanism that regulates the biosynthesis of storage oil in cottonseed, we identified the genes harbored in the major quantitative trait loci/nucleotides (QTLs/QTNs) of kernel oil content (KOC) in cottonseed via both multiple linkage analyses and genome-wide association studies (GWAS). In ‘CCRI70′ RILs, six stable QTLs were simultaneously identified by linkage analysis of CHIP and SLAF-seq strategies. In ‘0-153′ RILs, eight stable QTLs were detected by consensus linkage analysis integrating multiple strategies. In the natural panel, thirteen and eight loci were associated across multiple environments with two algorithms of GWAS. Within the confidence interval of a major common QTL on chromosome 3, six genes were identified as participating in the interaction network highly correlated with cottonseed KOC. Further observations of gene differential expression showed that four of the genes, LtnD, PGK, LPLAT1, and PAH2, formed hub genes and two of them, FER and RAV1, formed the key genes in the interaction network. Sequence variations in the coding regions of LtnD, FER, PGK, LPLAT1, and PAH2 genes may support their regulatory effects on oil accumulation in mature cottonseed. Taken together, clustering of the hub genes in the lipid biosynthesis interaction network provides new insights to understanding the mechanism of fatty acid biosynthesis and TAG assembly and to further genetic improvement projects for the KOC in cottonseeds.
Collapse
Affiliation(s)
- Juwu Gong
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- Engineering Research Centre of Cotton, Ministry of Education, College of Agriculture, Xinjiang Agricultural University, 311 Nongda East Road, Urumqi 830052, Xinjiang, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Yan Peng
- Third Division of the Xinjiang Production and Construction Corps Agricultural Research Institute, Tumushuke, Xijiang 843900, China
| | - Jiwen Yu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Wenfeng Pei
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Zhen Zhang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Daoran Fan
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Linjie Liu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- Engineering Research Centre of Cotton, Ministry of Education, College of Agriculture, Xinjiang Agricultural University, 311 Nongda East Road, Urumqi 830052, Xinjiang, China
| | - Xianghui Xiao
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- Engineering Research Centre of Cotton, Ministry of Education, College of Agriculture, Xinjiang Agricultural University, 311 Nongda East Road, Urumqi 830052, Xinjiang, China
| | - Ruixian Liu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- Engineering Research Centre of Cotton, Ministry of Education, College of Agriculture, Xinjiang Agricultural University, 311 Nongda East Road, Urumqi 830052, Xinjiang, China
| | - Quanwei Lu
- College of Biotechnology and Food Engineering, Anyang Institute of Technology, Anyang, China
| | - Pengtao Li
- College of Biotechnology and Food Engineering, Anyang Institute of Technology, Anyang, China
| | - Haihong Shang
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Yuzhen Shi
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Junwen Li
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Qun Ge
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Aiying Liu
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Xiaoying Deng
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Senmiao Fan
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Jingtao Pan
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| | - Quanjia Chen
- Engineering Research Centre of Cotton, Ministry of Education, College of Agriculture, Xinjiang Agricultural University, 311 Nongda East Road, Urumqi 830052, Xinjiang, China
| | - Youlu Yuan
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
- Engineering Research Centre of Cotton, Ministry of Education, College of Agriculture, Xinjiang Agricultural University, 311 Nongda East Road, Urumqi 830052, Xinjiang, China
- School of Agricultural Sciences, Zhengzhou University, Zhengzhou 450001, Henan, China
| | - Wankui Gong
- State Key Laboratory of Cotton Biology, Institute of Cotton Research, Chinese Academy of Agricultural Sciences, Anyang 455000, Henan, China
| |
Collapse
|
7
|
Heat flow random walks in biomolecular systems using symbolic transfer entropy and graph theory. J Mol Graph Model 2021; 104:107838. [PMID: 33529933 DOI: 10.1016/j.jmgm.2021.107838] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/28/2020] [Revised: 12/19/2020] [Accepted: 01/04/2021] [Indexed: 11/23/2022]
Abstract
This study combines the information- and graph-theoretic measures to investigate the cluster modulation of the amino acid residues and nucleotides at complex biomolecular interfaces. The symbolic transfer entropy is used as an information-theoretic measure. I also used graph theory to obtain information and heat flow weighted digraph models used to study the topology of information and heat flow paths at complex biomolecular interfaces. I introduce the graph-theoretic measures, such as the influence score and betweenness centrality, to identify the most influential amino acid and nucleotide sequences as sources of the information and absorb centers of the structure's heat flow. PageRank-like random walks algorithm is used to analyze the network of amino acid and nucleotide sequences at the protein-RNA interface combined with weighted digraph models. The cluster analysis using graph-theoretic measures revealed the modular molecular structure and the mechanism of the binding interface. In this study, the first benchmark system is an intuitive directed information flow network used to test the algorithms, and the second benchmark is a protein-RNA complex system. The approach was able to identify the most influential amino acid residues and nucleotides. Furthermore, the statistical cluster analysis using graph-theoretic measures revealed the modular molecular structure and the binding mechanism at the interface.
Collapse
|
8
|
Patra S, Mohapatra A. Protein complex prediction in interaction network based on network motif. Comput Biol Chem 2020; 89:107399. [PMID: 33152665 DOI: 10.1016/j.compbiolchem.2020.107399] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2020] [Revised: 08/07/2020] [Accepted: 10/01/2020] [Indexed: 11/28/2022]
Abstract
The enormous size of Protein-Protein Interaction (PPI) networks demands efficient computational methods to extract biologically significant protein complexes. A wide variety of algorithms have been proposed to predict protein complexes from PPI networks. However, it is still a challenging task to detect protein complexes with high accuracy and manageable sensitivity. In this manuscript, a novel complex prediction algorithm based on Network Motif (CPNM) is proposed. This algorithm addresses the role of proteins in the embeddings of network motif. These roles are used to define feature vectors and feature weights of proteins. Based on these features, a neighborhood search technique predict the protein complexes that consider both the inherent organization of proteins as well as the dense regions in PPI networks. The performance of the proposed algorithm is evaluated using various evaluation metrics like Precision, Recall, F-measure, Sensitivity, PPV, and Accuracy. The research finding indicates that the proposed algorithm outperforms most of the competing algorithms like MCODE, DPClus, RNSC, COACH, ClusterONE, CMC and PROCODE over the PPI network of Saccharomyces cerevisiae and Homo sapiens.
Collapse
Affiliation(s)
- Sabyasachi Patra
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| | - Anjali Mohapatra
- Bioinformatics Lab, Department of Computer Science, IIIT, Bhubaneswar, India.
| |
Collapse
|
9
|
Ying KC, Lin SW. Maximizing cohesion and separation for detecting protein functional modules in protein-protein interaction networks. PLoS One 2020; 15:e0240628. [PMID: 33048996 PMCID: PMC7553341 DOI: 10.1371/journal.pone.0240628] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/09/2020] [Accepted: 09/29/2020] [Indexed: 12/26/2022] Open
Abstract
Protein Function Module (PFM) identification in Protein-Protein Interaction Networks (PPINs) is one of the most important and challenging tasks in computational biology. The quick and accurate detection of PFMs in PPINs can contribute greatly to the understanding of the functions, properties, and biological mechanisms in research on various diseases and the development of new medicines. Despite the performance of existing detection approaches being improved to some extent, there are still opportunities for further enhancements in the efficiency, accuracy, and robustness of such detection methods. Based on the uniqueness of the network-clustering problem in the context of PPINs, this study proposed a very effective and efficient model based on the Lin-Kernighan-Helsgaun algorithm for detecting PFMs in PPINs. To demonstrate the effectiveness and efficiency of the proposed model, computational experiments are performed using three different categories of species datasets. The computational results reveal that the proposed model outperforms existing detection techniques in terms of two key performance indices, i.e., the degree of polymerization inside PFMs (cohesion) and the deviation degree between PFMs (separation), while being very fast and robust. The proposed model can be used to help researchers decide whether to conduct further expensive and time-consuming biological experiments and to select target proteins from large-scale PPI data for further detailed research.
Collapse
Affiliation(s)
- Kuo-Ching Ying
- Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei, Taiwan
| | - Shih-Wei Lin
- Department of Information Management, Chang Gung University, Taoyuan, Taiwan
- Linkou Chang Gung Memorial Hospital, Taoyuan, Taiwan
- Ming Chi University of Technology, Taipei, Taiwan
- * E-mail:
| |
Collapse
|
10
|
Wu Z, Liao Q, Liu B. A comprehensive review and evaluation of computational methods for identifying protein complexes from protein–protein interaction networks. Brief Bioinform 2019; 21:1531-1548. [DOI: 10.1093/bib/bbz085] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/17/2019] [Accepted: 06/17/2019] [Indexed: 02/04/2023] Open
Abstract
Abstract
Protein complexes are the fundamental units for many cellular processes. Identifying protein complexes accurately is critical for understanding the functions and organizations of cells. With the increment of genome-scale protein–protein interaction (PPI) data for different species, various computational methods focus on identifying protein complexes from PPI networks. In this article, we give a comprehensive and updated review on the state-of-the-art computational methods in the field of protein complex identification, especially focusing on the newly developed approaches. The computational methods are organized into three categories, including cluster-quality-based methods, node-affinity-based methods and ensemble clustering methods. Furthermore, the advantages and disadvantages of different methods are discussed, and then, the performance of 17 state-of-the-art methods is evaluated on two widely used benchmark data sets. Finally, the bottleneck problems and their potential solutions in this important field are discussed.
Collapse
Affiliation(s)
- Zhourun Wu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Qing Liao
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
11
|
Wang T, Peng J, Peng Q, Wang Y, Chen J. FSM: Fast and scalable network motif discovery for exploring higher-order network organizations. Methods 2019; 173:83-93. [PMID: 31306744 DOI: 10.1016/j.ymeth.2019.07.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2019] [Revised: 06/30/2019] [Accepted: 07/09/2019] [Indexed: 01/06/2023] Open
Abstract
Networks exhibit rich and diverse higher-order organizational structures. Network motifs, which are recurring significant patterns of inter-connections, are recognized as fundamental units to study the higher-order organizations of networks. However, the principle of selecting representative network motifs for local motif based clustering remains largely unexplored. We present a scalable algorithm called FSM for network motif discovery. FSM is advantageous in twofold. First, it accelerates the motif discovery process by effectively reducing the number of times for subgraph isomorphism labeling. Second, FSM adopts multiple heuristic optimizations for subgraph enumeration and classification to further improve its performance. Experimental results on biological networks show that, comparing with the existing network motif discovery algorithm, FSM is more efficient on computational efficiency and memory usage. Furthermore, with the large, frequent, and sparse network motifs discovered by FSM, the higher-order organizational structures of biological networks were successfully revealed, indicating that FSM is suitable to select network representative network motifs for exploring high-order network organizations.
Collapse
Affiliation(s)
- Tao Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Qidi Peng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Jin Chen
- Institute for Biomedical Informatics, University of Kentucky, Lexington, KY, USA.
| |
Collapse
|
12
|
Lu X, Wu Z, Zhao XY, Li CF, Kan SF. Systematic tracking of altered modules identifies the key biomarkers involved in chronic lymphocytic leukemia. Oncol Lett 2019; 17:2351-2355. [PMID: 30675301 PMCID: PMC6341787 DOI: 10.3892/ol.2018.9812] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2018] [Accepted: 11/27/2018] [Indexed: 11/26/2022] Open
Abstract
Key genes in chronic lymphocytic leukemia (CLL) were investigated through systematically tracking the dysregulated modules from protein-protein interaction (PPI) networks. Microarray data of normal subjects and CLL patients recruited from ArrayExpress database were applied to extract differentially expressed genes (DEGs). Additionally, we re-weighted the PPI network of normal and CLL conditions by means of Pearsons correlation coefficient (PCC). Furthermore, clique-merging method was applied to extract the modules and then the altered modules were screened out. The intersection genes were selected from miss and add genes in the altered modules. The common genes were screened from the intersection genes and DEGs in CLL. A total of 734 DEGs were screened by statistical analysis. In this investigation, there were 1,805 and 703 modules in normal as well as disease PPI network. In addition, 875 altered modules were obtained which included 145 miss genes, 353 add genes and 85 intersection genes. Finally, in-depth analysis revealed 9 mutual genes between the intersection genes and DEGs in CLL. Our analysis revealed several key genes associated with CLL by systematically tracking the dysregulated modules, which might be candidate targets for diagnosis and management of CLL.
Collapse
Affiliation(s)
- Xin Lu
- Department of Blood Transfusion, Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Zhen Wu
- Department of Blood Transfusion, Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Xue-Ying Zhao
- Department of Blood Transfusion, Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Chun-Feng Li
- Department of Blood Transfusion, Second Hospital of Shandong University, Jinan, Shandong 250033, P.R. China
| | - Shi-Feng Kan
- Department of Laboratory Medicine, Qilu Hospital of Shandong University, Jinan, Shandong 250012, P.R. China
| |
Collapse
|
13
|
Sun H, Shen Y, Luo G, Cai Y, Xiang Z. An integrated strategy for identifying new targets and inferring the mechanism of action: taking rhein as an example. BMC Bioinformatics 2018; 19:315. [PMID: 30189851 PMCID: PMC6127921 DOI: 10.1186/s12859-018-2346-4] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2018] [Accepted: 08/29/2018] [Indexed: 02/19/2023] Open
Abstract
Background Target identification is necessary for the comprehensive inference of the mechanism of action of a compound. The application of computational methods to predict the targets of bioactive compounds saves cost and time in drug research and development. Therefore, we designed an integrated strategy consisting of ligand-protein docking, network analysis, enrichment analysis, and an experimental surface plasmon resonance (SPR) method to identify and validate new targets, and then used enriched pathways to elucidate the underlying pharmacological mechanisms. Here, we used rhein, a compound with various pharmacological activities, as an example to find some of its previously unknown targets and to determine its pharmacological activity. Results A total of nine candidate targets were discovered, including LCK, HSP90AA1, RAB5A, EGFR, CDK2, CDK6, GSK3B, p38, and JNK. LCK was confirmed through SPR experiments, and HSP90AA1, EGFR, CDK6, p38, and JNK were validated through previous reports. Rhein network regulations are complex and interconnected. The therapeutic effect of rhein is the synergistic and comprehensive result of this vast and complex network, and the perturbation of multiple targets gives rhein its various pharmacological activities. Conclusions This study provided a new integrated strategy to identify new targets of bioactive compounds and reveal their molecular mechanisms of action. Electronic supplementary material The online version of this article (10.1186/s12859-018-2346-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Hao Sun
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325035, China.,Pharmacy Department, Women's Hospital, Zhejiang University School of Medicine, Hangzhou, 310006, Zhejiang, China
| | - Yiting Shen
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325035, China
| | - Guangwen Luo
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325035, China
| | - Yuepiao Cai
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325035, China.
| | - Zheng Xiang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, 325035, China.
| |
Collapse
|
14
|
Penga J, Wang T, Huc J, Wang Y, Chen J. Constructing Networks of Organelle Functional Modules in Arabidopsis. Curr Genomics 2016; 17:427-438. [PMID: 28479871 PMCID: PMC5320545 DOI: 10.2174/1389202917666160726151048] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/01/2015] [Revised: 05/30/2015] [Accepted: 06/05/2015] [Indexed: 11/22/2022] Open
Abstract
With the rapid accumulation of gene expression data, gene functional module identification has become a widely used approach in functional analysis. However, tools to identify organelle functional modules and analyze their relationships are still missing. We present a soft thresholding approach to construct networks of functional modules using gene expression datasets, in which nodes are strongly co-expressed genes that encode proteins residing in the same subcellular localization, and links represent strong inter-module connections. Our algorithm has three steps. First, we identify functional modules by analyzing gene expression data. Next, we use a self-adaptive approach to construct a mixed network of functional modules and genes. Finally, we link functional modules that are tightly connected in the mixed network. Analysis of experimental data from Arabidopsis demonstrates that our approach is effective in improving the interpretability of high-throughput transcriptomic data and inferring function of unknown genes.
Collapse
Affiliation(s)
- Jiajie Penga
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China.,Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China
| | - Jianping Huc
- Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA.,Department of Plant Biology, Michigan State University, East Lansing, USA
| | - Yadong Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, P.R. China
| | - Jin Chen
- Department of Energy Plant Research Lab, Michigan State University, East Lansing, USA.,Department of Computer Science and Engineering, Michigan State University, East Lansing, USA
| |
Collapse
|
15
|
Zhang C, Wang J, Zhang C, Liu J, Xu D, Chen L. Network stratification analysis for identifying function-specific network layers. MOLECULAR BIOSYSTEMS 2016; 12:1232-40. [PMID: 26879865 DOI: 10.1039/c5mb00782h] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/04/2023]
Abstract
A major challenge of systems biology is to capture the rewiring of biological functions (e.g. signaling pathways) in a molecular network. To address this problem, we proposed a novel computational framework, namely network stratification analysis (NetSA), to stratify the whole biological network into various function-specific network layers corresponding to particular functions (e.g. KEGG pathways), which transform the network analysis from the gene level to the functional level by integrating expression data, the gene/protein network and gene ontology information altogether. The application of NetSA in yeast and its comparison with a traditional network-partition both suggest that NetSA can more effectively reveal functional implications of network rewiring and extract significant phenotype-related biological processes. Furthermore, for time-series or stage-wise data, the function-specific network layer obtained by NetSA is also shown to be able to characterize the disease progression in a dynamic manner. In particular, when applying NetSA to hepatocellular carcinoma and type 1 diabetes, we can derive functional spectra regarding the progression of the disease, and capture active biological functions (i.e. active pathways) in different disease stages. The additional comparison between NetSA and SPIA illustrates again that NetSA could discover more complete biological functions during disease progression. Overall, NetSA provides a general framework to stratify a network into various layers of function-specific sub-networks, which can not only analyze a biological network on the functional level but also investigate gene rewiring patterns in biological processes.
Collapse
|
16
|
Pan A, Lahiri C, Rajendiran A, Shanmugham B. Computational analysis of protein interaction networks for infectious diseases. Brief Bioinform 2015; 17:517-26. [PMID: 26261187 PMCID: PMC7110031 DOI: 10.1093/bib/bbv059] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2015] [Indexed: 12/13/2022] Open
Abstract
Infectious diseases caused by pathogens, including viruses, bacteria and parasites, pose a serious threat to human health worldwide. Frequent changes in the pattern of infection mechanisms and the emergence of multidrug-resistant strains among pathogens have weakened the current treatment regimen. This necessitates the development of new therapeutic interventions to prevent and control such diseases. To cater to the need, analysis of protein interaction networks (PINs) has gained importance as one of the promising strategies. The present review aims to discuss various computational approaches to analyse the PINs in context to infectious diseases. Topology and modularity analysis of the network with their biological relevance, and the scenario till date about host–pathogen and intra-pathogenic protein interaction studies were delineated. This would provide useful insights to the research community, thereby enabling them to design novel biomedicine against such infectious diseases.
Collapse
|
17
|
Pizzuti C, Rombo SE. An evolutionary restricted neighborhood search clustering approach for PPI networks. Neurocomputing 2014. [DOI: 10.1016/j.neucom.2014.06.061] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023]
|
18
|
Diao CY, Guo HB, Ouyang YR, Zhang HC, Liu LH, Bu J, Wang ZH, Xiao T. Screening for metastatic osteosarcoma biomarkers with a DNA microarray. Asian Pac J Cancer Prev 2014; 15:1817-22. [PMID: 24641415 DOI: 10.7314/apjcp.2014.15.4.1817] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
OBJECTIVE The aim of this study was to screen for possible biomarkers of metastatic osteosarcoma (OS) using a DNA microarray. METHODS We downloaded the gene expression profile GSE49003 from Gene Expression Omnibus database, which included 6 gene chips from metastatic and 6 from non-metastatic OS patients. The R package was used to screen and identify differentially expressed genes (DEGs) between metastatic and non-metastatic OS patients. Then we compared the expression of DEGs in the two groups and sub-grouped into up-regulated and down-regulated, followed by functional enrichment analysis using the DAVID system. Subsequently, we constructed an miRNA-DEG regulatory network with the help of WebGestalt software. RESULTS A total of 323 DEGs, including 134 up-regulated and 189 down-regulated, were screened out. The up-regulated DEGs were enriched in 14 subcategories and most significantly in cytoskeleton organization, while the down-regulated DEGs were prevalent in 13 subcategories, especially wound healing. In addition, we identified two important miRNAs (miR-202 and miR-9) pivotal for OS metastasis, and their relevant genes, CALD1 and STX1A. CONCLUSIONS MiR-202 and miR-9 are potential key factors affecting the metastasis of OS and CALD1 and STX1A may be possible targets beneficial for the treatment of metastatic OS. However, further experimental studies are needed to confirm our results.
Collapse
Affiliation(s)
- Chun-Yu Diao
- Traumatic Orthopedic Research, Department of Orthopaedics, The Second Xiangya Hospital of Central South University, Changsha, China E-mail :
| | | | | | | | | | | | | | | |
Collapse
|
19
|
Xie R, Huang H, Li W, Chen B, Jiang J, He Y, Lv J, ma B, Zhou Y, Feng C, Chen L, He W. Identifying progression related disease risk modules based on the human subcellular signaling networks. MOLECULAR BIOSYSTEMS 2014; 10:3298-309. [PMID: 25315201 DOI: 10.1039/c4mb00482e] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/21/2022]
Abstract
Many studies have shown that the structure and dynamics of the human signaling network are disturbed in complex diseases such as coronary artery disease, and gene expression profiles can distinguish variations in diseases since they can accurately reflect the status of cells. Integration of subcellular localization and the human signaling network holds promise for providing insight into human diseases. In this study, we performed a novel algorithm to identify progression-related-disease-risk modules (PRDRMs) among patients of different disease states within eleven subcellular sub-networks from a human signaling network. The functional annotation and literature retrieval showed that the PRDRMs were strongly associated with disease pathogenesis. The results indicated that the PRDRM expression values as classification features had a good classification performance to distinguish patients of different disease states. Our approach compared with the method PageRank had a better classification performance. The identification of the PRDRMs in response to the dynamic gene expression change could facilitate our understanding of the pathological basis of complex diseases. Our strategy could provide new insights into the potential use of prognostic biomarkers and the effective guidance of clinical therapy from the human subcellular signaling network perspective.
Collapse
Affiliation(s)
- Ruiqiang Xie
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province 150081, China.
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
20
|
Qu X, Xie R, Chen L, Feng C, Zhou Y, Li W, Huang H, Jia X, Lv J, He Y, Du Y, Li W, Shi Y, He W. Identifying colon cancer risk modules with better classification performance based on human signaling network. Genomics 2014; 104:242-8. [DOI: 10.1016/j.ygeno.2013.11.002] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2013] [Revised: 09/29/2013] [Accepted: 11/01/2013] [Indexed: 11/26/2022]
|
21
|
Li YP, Gao CY. Identification of sevoflurane and propofol induced differentially expressed genes in gastric mucosa with DNA microarray. Shijie Huaren Xiaohua Zazhi 2014; 22:3649-3653. [DOI: 10.11569/wcjd.v22.i24.3649] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
AIM: To identify differentially expressed genes (DEGs) induced by sevoflurane and propofol with DNA microarray.
METHODS: The expression data of GSE4386 which contained atrial samples collected from patients receiving anesthetic gas sevoflurane or intravenous anesthetic propofol in gastrointestinal endoscopy were downloaded from Gene Expression Omnibus (GEO). The DEGs in the sevoflurane group and propofol group were identified and compared. Then, the functions significantly related to the DEGs were enriched. The interactive functional modules for common, sevoflurane specific and propofol specific DEGs were constructed to perform analysis of biological processes.
RESULTS: The percentages of DEGs were 31.3% (275/879) and 94.8% (275/290) in the sevoflurane group and propofol group, respectively. Functional categories for the common, sevoflurane specific and propofol specific DEGs were very similar. Function modules, such as regulation of transcription and regulation of cellular process, for the common, propofol specific and sevoflurane specific DEGs were identified.
CONCLUSION: Sevoflurane and propofol may synergistically reduce gastric mucosal tissue injury in patients undergoing gastrointestinal endoscopy.
Collapse
|
22
|
Shi SH, Cai YP, Cai XJ, Zheng XY, Cao DS, Ye FQ, Xiang Z. A network pharmacology approach to understanding the mechanisms of action of traditional medicine: Bushenhuoxue formula for treatment of chronic kidney disease. PLoS One 2014; 9:e89123. [PMID: 24598793 PMCID: PMC3943740 DOI: 10.1371/journal.pone.0089123] [Citation(s) in RCA: 60] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2013] [Accepted: 01/20/2014] [Indexed: 12/17/2022] Open
Abstract
Traditional Chinese medicine (TCM) has unique therapeutic effects for complex chronic diseases. However, for the lack of an effective systematic approach, the research progress on the effective substances and pharmacological mechanism of action has been very slow. In this paper, by incorporating network biology, bioinformatics and chemoinformatics methods, an integrated approach was proposed to systematically investigate and explain the pharmacological mechanism of action and effective substances of TCM. This approach includes the following main steps: First, based on the known drug targets, network biology was used to screen out putative drug targets; Second, the molecular docking method was used to calculate whether the molecules from TCM and drug targets related to chronic kidney diseases (CKD) interact or not; Third, according to the result of molecular docking, natural product-target network, main component-target network and compound-target network were constructed; Finally, through analysis of network characteristics and literature mining, potential effective multi-components and their synergistic mechanism were putatively identified and uncovered. Bu-shen-Huo-xue formula (BSHX) which was frequently used for treating CKD, was used as the case to demonstrate reliability of our proposed approach. The results show that BSHX has the therapeutic effect by using multi-channel network regulation, such as regulating the coagulation and fibrinolytic balance, and the expression of inflammatory factors, inhibiting abnormal ECM accumulation. Tanshinone IIA, rhein, curcumin, calycosin and quercetin may be potential effective ingredients of BSHX. This research shows that the integration approach can be an effective means for discovering active substances and revealing their pharmacological mechanisms of TCM.
Collapse
Affiliation(s)
- Shao-hua Shi
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Yue-piao Cai
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Xiao-jun Cai
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Xiao-yong Zheng
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
| | - Dong-sheng Cao
- School of Pharmaceutical Sciences, Central South University, Changsha, China
| | - Fa-qing Ye
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
- * E-mail: (FY); (ZX)
| | - Zheng Xiang
- School of Pharmaceutical Sciences, Wenzhou Medical University, Wenzhou, China
- * E-mail: (FY); (ZX)
| |
Collapse
|
23
|
Chen B, Fan W, Liu J, Wu FX. Identifying protein complexes and functional modules--from static PPI networks to dynamic PPI networks. Brief Bioinform 2014; 15:177-194. [PMID: 23780996 DOI: 10.1093/bib/bbt039] [Citation(s) in RCA: 93] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 09/09/2024] Open
Abstract
Cellular processes are typically carried out by protein complexes and functional modules. Identifying them plays an important role for our attempt to reveal principles of cellular organizations and functions. In this article, we review computational algorithms for identifying protein complexes and/or functional modules from protein-protein interaction (PPI) networks. We first describe issues and pitfalls when interpreting PPI networks. Then based on types of data used and main ideas involved, we briefly describe protein complex and/or functional module identification algorithms in four categories: (i) those based on topological structures of unweighted PPI networks; (ii) those based on characters of weighted PPI networks; (iii) those based on multiple data integrations; and (iv) those based on dynamic PPI networks. The PPI networks are modelled increasingly precise when integrating more types of data, and the study of protein complexes would benefit by shifting from static to dynamic PPI networks.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer, Wuhan University, Wuhan 430072, China. Tel.: +86-27-6877-5711; Fax: +86-27-6877-5711; ; Fang-Xiang Wu, College of Engineering, University of Saskatchewan, 57 Campus Drive, Saskatoon, SK S7N 5A9, Canada. Tel.: +1-306-966-5280; Fax: +1-306-966-5427; E-mail:
| | | | | | | |
Collapse
|
24
|
Pizzuti C, Rombo SE. Algorithms and tools for protein-protein interaction networks clustering, with a special focus on population-based stochastic methods. ACTA ACUST UNITED AC 2014; 30:1343-52. [PMID: 24458952 DOI: 10.1093/bioinformatics/btu034] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Protein-protein interaction (PPI) networks are powerful models to represent the pairwise protein interactions of the organisms. Clustering PPI networks can be useful for isolating groups of interacting proteins that participate in the same biological processes or that perform together specific biological functions. Evolutionary orthologies can be inferred this way, as well as functions and properties of yet uncharacterized proteins. RESULTS We present an overview of the main state-of-the-art clustering methods that have been applied to PPI networks over the past decade. We distinguish five specific categories of approaches, describe and compare their main features and then focus on one of them, i.e. population-based stochastic search. We provide an experimental evaluation, based on some validation measures widely used in the literature, of techniques in this class, that are as yet less explored than the others. In particular, we study how the capability of Genetic Algorithms (GAs) to extract clusters in PPI networks varies when different topology-based fitness functions are used, and we compare GAs with the main techniques in the other categories. The experimental campaign shows that predictions returned by GAs are often more accurate than those produced by the contestant methods. Interesting issues still remain open about possible generalizations of GAs allowing for cluster overlapping. AVAILABILITY AND IMPLEMENTATION We point out which methods and tools described here are publicly available. CONTACT simona.rombo@math.unipa.it SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Clara Pizzuti
- Institute for High Performance Computing and Networking (ICAR), National Research Council of Italy (CNR), Via P. Bucci 41C, 87036 Rende (CS) and Department of Mathematics and Computer Science, University of Palermo, Via Archirafi 34, 90123 Palermo (PA), Italy
| | | |
Collapse
|
25
|
de Oliveira GP, Maximino JR, Maschietto M, Zanoteli E, Puga RD, Lima L, Carraro DM, Chadi G. Early gene expression changes in skeletal muscle from SOD1(G93A) amyotrophic lateral sclerosis animal model. Cell Mol Neurobiol 2014; 34:451-62. [PMID: 24442855 DOI: 10.1007/s10571-014-0029-x] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Accepted: 01/07/2014] [Indexed: 02/07/2023]
Abstract
Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease characterized by loss of motor neurons. Familial ALS is strongly associated to dominant mutations in the gene for Cu/Zn superoxide dismutase (SOD1). Recent evidences point to skeletal muscle as a primary target in the ALS mouse model. Wnt/PI3 K signaling pathways and epithelial-mesenchymal transition (EMT) have important roles in maintenance and repair of skeletal muscle. Wnt/PI3 K pathways and EMT gene expression profile were investigated in gastrocnemius muscle from SOD1(G93A) mouse model and age-paired wild-type control in the presymptomatic ages of 40 and 80 days aiming the early neuromuscular abnormalities that precede motor neuron death in ALS. A customized cDNA microarray platform containing 326 genes of Wnt/PI3 K and EMT was used and results revealed eight up-regulated (Loxl2, Pik4ca, Fzd9, Cul1, Ctnnd1, Snf1lk, Prkx, Dner) and nine down-regulated (Pik3c2a, Ripk4, Id2, C1qdc1, Eif2ak2, Rac3, Cds1, Inppl1, Tbl1x) genes at 40 days, and also one up-regulated (Pik3ca) and five down-regulated (Cd44, Eef2 k, Fzd2, Crebbp, Piki3r1) genes at 80 days. Also, protein-protein interaction networks grown from the differentially expressed genes of 40 and 80 days old mice have identified Grb2 and Src genes in both presymptomatic ages, thus playing a potential central role in the disease mechanisms. mRNA and protein levels for Grb2 and Src were found to be increased in 80 days old ALS mice. Gene expression changes in the skeletal muscle of transgenic ALS mice at presymptomatic periods of disease gave further evidence of early neuromuscular abnormalities that precede motor neuron death. The results were discussed in terms of initial triggering for neuronal degeneration and muscle adaptation to keep function before the onset of symptoms.
Collapse
Affiliation(s)
- Gabriela P de Oliveira
- Neuroregeneration Center, Department of Neurology, University of São Paulo School of Medicine, Av. Dr. Arnaldo, 455, 2nd Floor, Room 2119, São Paulo, 01246-903, Brazil
| | | | | | | | | | | | | | | |
Collapse
|
26
|
Azad AKM, Lee H. Voting-based cancer module identification by combining topological and data-driven properties. PLoS One 2013; 8:e70498. [PMID: 23940583 PMCID: PMC3734239 DOI: 10.1371/journal.pone.0070498] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 06/19/2013] [Indexed: 12/19/2022] Open
Abstract
Recently, computational approaches integrating copy number aberrations (CNAs) and gene expression (GE) have been extensively studied to identify cancer-related genes and pathways. In this work, we integrate these two data sets with protein-protein interaction (PPI) information to find cancer-related functional modules. To integrate CNA and GE data, we first built a gene-gene relationship network from a set of seed genes by enumerating all types of pairwise correlations, e.g. GE-GE, CNA-GE, and CNA-CNA, over multiple patients. Next, we propose a voting-based cancer module identification algorithm by combining topological and data-driven properties (VToD algorithm) by using the gene-gene relationship network as a source of data-driven information, and the PPI data as topological information. We applied the VToD algorithm to 266 glioblastoma multiforme (GBM) and 96 ovarian carcinoma (OVC) samples that have both expression and copy number measurements, and identified 22 GBM modules and 23 OVC modules. Among 22 GBM modules, 15, 12, and 20 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Among 23 OVC modules, 19, 18, and 23 modules were significantly enriched with cancer-related KEGG, BioCarta pathways, and GO terms, respectively. Similarly, we also observed that 9 and 2 GBM modules and 15 and 18 OVC modules were enriched with cancer gene census (CGC) and specific cancer driver genes, respectively. Our proposed module-detection algorithm significantly outperformed other existing methods in terms of both functional and cancer gene set enrichments. Most of the cancer-related pathways from both cancer data sets found in our algorithm contained more than two types of gene-gene relationships, showing strong positive correlations between the number of different types of relationship and CGC enrichment -values (0.64 for GBM and 0.49 for OVC). This study suggests that identified modules containing both expression changes and CNAs can explain cancer-related activities with greater insights.
Collapse
Affiliation(s)
- A. K. M. Azad
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
| | - Hyunju Lee
- School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, South Korea
- * E-mail:
| |
Collapse
|
27
|
Nassiri I, Masoudi-Nejad A, Jalili M, Moeini A. Discovering dominant pathways and signal-response relationships in signaling networks through nonparametric approaches. Genomics 2013; 102:195-201. [PMID: 23912059 DOI: 10.1016/j.ygeno.2013.07.012] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2013] [Revised: 07/22/2013] [Accepted: 07/26/2013] [Indexed: 11/25/2022]
Abstract
A signaling pathway is a sequence of proteins and passenger molecules that transmits information from the cell surface to target molecules. Understanding signal transduction process requires detailed description of the involved pathways. Several methods and tools resolved this problem by incorporating genomic and proteomic data. However, the difficulty of obtaining prior knowledge of complex signaling networks limited the applicability of these tools. In this study, based on the simulation of signal flow in signaling network, we introduce a method for determining dominant pathways and signal response to stimulations. The model uses topology-weighted transit compartment approach and comprises four main steps which include weighting the edges, simulating signal transduction in the network (weighting the nodes), finding paths between initial and target nodes, and assigning a significance score to each path. We applied the proposed model to eighty-three signaling networks by using biologically derived source and sink molecules. The recovered dominant paths matched many known signaling pathways and suggesting a promising index to analyze the phenotype essentiality of molecule encoding paths. We also modeled the stimulus-response relations in long and short-term synaptic plasticity based on the dominant signaling pathway concept. We showed that the proposed method not only accurately determines dominant signaling pathways, but also identifies effective points of intervention in signal transduction.
Collapse
Affiliation(s)
- Isar Nassiri
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | | | | | | |
Collapse
|
28
|
Ficklin SP, Feltus FA. A systems-genetics approach and data mining tool to assist in the discovery of genes underlying complex traits in Oryza sativa. PLoS One 2013; 8:e68551. [PMID: 23874666 PMCID: PMC3713027 DOI: 10.1371/journal.pone.0068551] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2013] [Accepted: 05/30/2013] [Indexed: 12/13/2022] Open
Abstract
Many traits of biological and agronomic significance in plants are controlled in a complex manner where multiple genes and environmental signals affect the expression of the phenotype. In Oryza sativa (rice), thousands of quantitative genetic signals have been mapped to the rice genome. In parallel, thousands of gene expression profiles have been generated across many experimental conditions. Through the discovery of networks with real gene co-expression relationships, it is possible to identify co-localized genetic and gene expression signals that implicate complex genotype-phenotype relationships. In this work, we used a knowledge-independent, systems genetics approach, to discover a high-quality set of co-expression networks, termed Gene Interaction Layers (GILs). Twenty-two GILs were constructed from 1,306 Affymetrix microarray rice expression profiles that were pre-clustered to allow for improved capture of gene co-expression relationships. Functional genomic and genetic data, including over 8,000 QTLs and 766 phenotype-tagged SNPs (p-value < = 0.001) from genome-wide association studies, both covering over 230 different rice traits were integrated with the GILs. An online systems genetics data-mining resource, the GeneNet Engine, was constructed to enable dynamic discovery of gene sets (i.e. network modules) that overlap with genetic traits. GeneNet Engine does not provide the exact set of genes underlying a given complex trait, but through the evidence of gene-marker correspondence, co-expression, and functional enrichment, site visitors can identify genes with potential shared causality for a trait which could then be used for experimental validation. A set of 2 million SNPs was incorporated into the database and serve as a potential set of testable biomarkers for genes in modules that overlap with genetic traits. Herein, we describe two modules found using GeneNet Engine, one with significant overlap with the trait amylose content and another with significant overlap with blast disease resistance.
Collapse
Affiliation(s)
- Stephen P Ficklin
- Plant and Environmental Sciences, Clemson University, Clemson, South Carolina, United States of America
| | | |
Collapse
|
29
|
Feltus FA, Ficklin SP, Gibson SM, Smith MC. Maximizing capture of gene co-expression relationships through pre-clustering of input expression samples: an Arabidopsis case study. BMC SYSTEMS BIOLOGY 2013; 7:44. [PMID: 23738693 PMCID: PMC3679940 DOI: 10.1186/1752-0509-7-44] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/18/2012] [Accepted: 05/14/2013] [Indexed: 12/11/2022]
Abstract
Background In genomics, highly relevant gene interaction (co-expression) networks have been constructed by finding significant pair-wise correlations between genes in expression datasets. These networks are then mined to elucidate biological function at the polygenic level. In some cases networks may be constructed from input samples that measure gene expression under a variety of different conditions, such as for different genotypes, environments, disease states and tissues. When large sets of samples are obtained from public repositories it is often unmanageable to associate samples into condition-specific groups, and combining samples from various conditions has a negative effect on network size. A fixed significance threshold is often applied also limiting the size of the final network. Therefore, we propose pre-clustering of input expression samples to approximate condition-specific grouping of samples and individual network construction of each group as a means for dynamic significance thresholding. The net effect is increase sensitivity thus maximizing the total co-expression relationships in the final co-expression network compendium. Results A total of 86 Arabidopsis thaliana co-expression networks were constructed after k-means partitioning of 7,105 publicly available ATH1 Affymetrix microarray samples. We term each pre-sorted network a Gene Interaction Layer (GIL). Random Matrix Theory (RMT), an un-supervised thresholding method, was used to threshold each of the 86 networks independently, effectively providing a dynamic (non-global) threshold for the network. The overall gene count across all GILs reached 19,588 genes (94.7% measured gene coverage) and 558,022 unique co-expression relationships. In comparison, network construction without pre-sorting of input samples yielded only 3,297 genes (15.9%) and 129,134 relationships. in the global network. Conclusions Here we show that pre-clustering of microarray samples helps approximate condition-specific networks and allows for dynamic thresholding using un-supervised methods. Because RMT ensures only highly significant interactions are kept, the GIL compendium consists of 558,022 unique high quality A. thaliana co-expression relationships across almost all of the measurable genes on the ATH1 array. For A. thaliana, these networks represent the largest compendium to date of significant gene co-expression relationships, and are a means to explore complex pathway, polygenic, and pleiotropic relationships for this focal model plant. The networks can be explored at sysbio.genome.clemson.edu. Finally, this method is applicable to any large expression profile collection for any organism and is best suited where a knowledge-independent network construction method is desired.
Collapse
Affiliation(s)
- F Alex Feltus
- Department of Genetics & Biochemistry, Clemson University, 105 Collings Street, Clemson, SC 29634, USA.
| | | | | | | |
Collapse
|
30
|
Villa-Vialaneix N, Liaubet L, Laurent T, Cherel P, Gamot A, SanCristobal M. The structure of a gene co-expression network reveals biological functions underlying eQTLs. PLoS One 2013; 8:e60045. [PMID: 23577081 PMCID: PMC3618335 DOI: 10.1371/journal.pone.0060045] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2012] [Accepted: 02/20/2013] [Indexed: 11/18/2022] Open
Abstract
What are the commonalities between genes, whose expression level is partially controlled by eQTL, especially with regard to biological functions? Moreover, how are these genes related to a phenotype of interest? These issues are particularly difficult to address when the genome annotation is incomplete, as is the case for mammalian species. Moreover, the direct link between gene expression and a phenotype of interest may be weak, and thus difficult to handle. In this framework, the use of a co-expression network has proven useful: it is a robust approach for modeling a complex system of genetic regulations, and to infer knowledge for yet unknown genes. In this article, a case study was conducted with a mammalian species. It showed that the use of a co-expression network based on partial correlation, combined with a relevant clustering of nodes, leads to an enrichment of biological functions of around 83%. Moreover, the use of a spatial statistics approach allowed us to superimpose additional information related to a phenotype; this lead to highlighting specific genes or gene clusters that are related to the network structure and the phenotype. Three main results are worth noting: first, key genes were highlighted as a potential focus for forthcoming biological experiments; second, a set of biological functions, which support a list of genes under partial eQTL control, was set up by an overview of the global structure of the gene expression network; third, pH was found correlated with gene clusters, and then with related biological functions, as a result of a spatial analysis of the network topology.
Collapse
|
31
|
Restricted Neighborhood Search Clustering Revisited: An Evolutionary Computation Perspective. PATTERN RECOGNITION IN BIOINFORMATICS 2013. [DOI: 10.1007/978-3-642-39159-0_6] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/06/2023]
|
32
|
Abstract
Background Phenotypes exhibited by microorganisms can be useful for several purposes, e.g., ethanol as an alternate fuel. Sometimes, the target phenotype maybe required in combination with other phenotypes, in order to be useful, for e.g., an industrial process may require that the organism survive in an anaerobic, alcohol rich environment and be able to feed on both hexose and pentose sugars to produce ethanol. This combination of traits may not be available in any existing organism or if they do exist, the mechanisms involved in the phenotype-expression may not be efficient enough to be useful. Thus, it may be required to genetically modify microorganisms. However, before any genetic modification can take place, it is important to identify the underlying cellular subsystems responsible for the expression of the target phenotype. Results In this paper, we develop a method to identify statistically significant and phenotypically-biased functional modules. The method can compare the organismal network information from hundreds of phenotype expressing and phenotype non-expressing organisms to identify cellular subsystems that are more prone to occur in phenotype-expressing organisms than in phenotype non-expressing organisms. We have provided literature evidence that the phenotype-biased modules identified for phenotypes such as hydrogen production (dark and light fermentation), respiration, gram-positive, gram-negative and motility, are indeed phenotype-related. Conclusion Thus we have proposed a methodology to identify phenotype-biased cellular subsystems. We have shown the effectiveness of our methodology by applying it to several target phenotypes. The code and all supplemental files can be downloaded from (http://freescience.org/cs/phenotype-biased-biclusters/).
Collapse
|
33
|
Pizzuti C, Rombo SE. A coclustering approach for mining large protein-protein interaction networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012; 9:717-730. [PMID: 22201069 DOI: 10.1109/tcbb.2011.158] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]
Abstract
Several approaches have been presented in the literature to cluster Protein-Protein Interaction (PPI) networks. They can be grouped in two main categories: those allowing a protein to participate in different clusters and those generating only nonoverlapping clusters. In both cases, a challenging task is to find a suitable compromise between the biological relevance of the results and a comprehensive coverage of the analyzed networks. Indeed, methods returning high accurate results are often able to cover only small parts of the input PPI network, especially when low-characterized networks are considered. We present a coclustering-based technique able to generate both overlapping and nonoverlapping clusters. The density of the clusters to search for can also be set by the user. We tested our method on the two networks of yeast and human, and compared it to other five well-known techniques on the same interaction data sets. The results showed that, for all the examples considered, our approach always reaches a good compromise between accuracy and network coverage. Furthermore, the behavior of our algorithm is not influenced by the structure of the input network, different from all the techniques considered in the comparison, which returned very good results on the yeast network, while on the human network their outcomes are rather poor.
Collapse
Affiliation(s)
- Clara Pizzuti
- Institute for High Performance Computing and Networking-ICAR, National Research Council of Italy-CNR, Via P. Bucci 41C, 87036 Rende-CS, Italy.
| | | |
Collapse
|
34
|
Pizzuti C, Rombo SE, Marchiori E. Complex Detection in Protein-Protein Interaction Networks: A Compact Overview for Researchers and Practitioners. EVOLUTIONARY COMPUTATION, MACHINE LEARNING AND DATA MINING IN BIOINFORMATICS 2012. [DOI: 10.1007/978-3-642-29066-4_19] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/24/2022]
|
35
|
Wang J, Chen G, Li M, Pan Y. Integration of breast cancer gene signatures based on graph centrality. BMC SYSTEMS BIOLOGY 2011; 5 Suppl 3:S10. [PMID: 22784616 PMCID: PMC3287565 DOI: 10.1186/1752-0509-5-s3-s10] [Citation(s) in RCA: 37] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
BACKGROUND Various gene-expression signatures for breast cancer are available for the prediction of clinical outcome. However due to small overlap between different signatures, it is challenging to integrate existing disjoint signatures to provide a unified insight on the association between gene expression and clinical outcome. RESULTS In this paper, we propose a method to integrate different breast cancer gene signatures by using graph centrality in a context-constrained protein interaction network (PIN). The context-constrained PIN for breast cancer is built by integrating complete PIN and various gene signatures reported in literatures. Then, we use graph centralities to quantify the importance of genes to breast cancer. Finally, we get reliable gene signatures that are consisted by the genes with high graph centrality. The genes which are well-known breast cancer genes, such as TP53 and BRCA1, are ranked extremely high in our results. Compared with previous results by functional enrichment analysis, graph centralities, especially the eigenvector centrality and subgraph centrality, based gene signatures are more tightly related to breast cancer. We validate these signatures on genome-wide microarray dataset and found strong association between the expression of these signature genes and pathologic parameters. CONCLUSIONS In summary, graph centralities provide a novel way to connect different cancer signatures and to understand the mechanism of relationship between gene expression and clinical outcome of breast cancer. Moreover, this method is not only can be used on breast cancer, but also can be used on other gene expression related diseases and drug studies.
Collapse
Affiliation(s)
- Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | | | | | | |
Collapse
|
36
|
Jiao QJ, Zhang YK, Li LN, Shen HB. BinTree seeking: a novel approach to mine both bi-sparse and cohesive modules in protein interaction networks. PLoS One 2011; 6:e27646. [PMID: 22140454 PMCID: PMC3225364 DOI: 10.1371/journal.pone.0027646] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2011] [Accepted: 10/21/2011] [Indexed: 01/01/2023] Open
Abstract
Modern science of networks has brought significant advances to our understanding of complex systems biology. As a representative model of systems biology, Protein Interaction Networks (PINs) are characterized by a remarkable modular structures, reflecting functional associations between their components. Many methods were proposed to capture cohesive modules so that there is a higher density of edges within modules than those across them. Recent studies reveal that cohesively interacting modules of proteins is not a universal organizing principle in PINs, which has opened up new avenues for revisiting functional modules in PINs. In this paper, functional clusters in PINs are found to be able to form unorthodox structures defined as bi-sparse module. In contrast to the traditional cohesive module, the nodes in the bi-sparse module are sparsely connected internally and densely connected with other bi-sparse or cohesive modules. We present a novel protocol called the BinTree Seeking (BTS) for mining both bi-sparse and cohesive modules in PINs based on Edge Density of Module (EDM) and matrix theory. BTS detects modules by depicting links and nodes rather than nodes alone and its derivation procedure is totally performed on adjacency matrix of networks. The number of modules in a PIN can be automatically determined in the proposed BTS approach. BTS is tested on three real PINs and the results demonstrate that functional modules in PINs are not dominantly cohesive but can be sparse. BTS software and the supporting information are available at: www.csbio.sjtu.edu.cn/bioinf/BTS/.
Collapse
Affiliation(s)
- Qing-Ju Jiao
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| | - Yan-Kai Zhang
- Department of Physics, Shanghai Jiao Tong University, Shanghai, China
| | - Lu-Ning Li
- Department of Physics, Shanghai Jiao Tong University, Shanghai, China
| | - Hong-Bin Shen
- Department of Automation, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
- * E-mail:
| |
Collapse
|
37
|
Shi L, Lei X, Zhang A. Protein complex detection with semi-supervised learning in protein interaction networks. Proteome Sci 2011; 9 Suppl 1:S5. [PMID: 22165896 PMCID: PMC3289084 DOI: 10.1186/1477-5956-9-s1-s5] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Background Protein-protein interactions (PPIs) play fundamental roles in nearly all biological processes. The systematic analysis of PPI networks can enable a great understanding of cellular organization, processes and function. In this paper, we investigate the problem of protein complex detection from noisy protein interaction data, i.e., finding the subsets of proteins that are closely coupled via protein interactions. However, protein complexes are likely to overlap and the interaction data are very noisy. It is a great challenge to effectively analyze the massive data for biologically meaningful protein complex detection. Results Many people try to solve the problem by using the traditional unsupervised graph clustering methods. Here, we stand from a different point of view, redefining the properties and features for protein complexes and designing a “semi-supervised” method to analyze the problem. In this paper, we utilize the neural network with the “semi-supervised” mechanism to detect the protein complexes. By retraining the neural network model recursively, we could find the optimized parameters for the model, in such a way we can successfully detect the protein complexes. The comparison results show that our algorithm could identify protein complexes that are missed by other methods. We also have shown that our method achieve better precision and recall rates for the identified protein complexes than other existing methods. In addition, the framework we proposed is easy to be extended in the future. Conclusions Using a weighted network to represent the protein interaction network is more appropriate than using a traditional unweighted network. In addition, integrating biological features and topological features to represent protein complexes is more meaningful than using dense subgraphs. Last, the “semi-supervised” learning model is a promising model to detect protein complexes with more biological and topological features available.
Collapse
Affiliation(s)
- Lei Shi
- Computer Science & Engineering Department, State University of New York at Buffalo, Buffalo, NY, USA.
| | | | | |
Collapse
|
38
|
Wang J, Li M, Chen J, Pan Y. A fast hierarchical clustering algorithm for functional modules discovery in protein interaction networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2011; 8:607-620. [PMID: 20733244 DOI: 10.1109/tcbb.2010.75] [Citation(s) in RCA: 104] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]
Abstract
As advances in the technologies of predicting protein interactions, huge data sets portrayed as networks have been available. Identification of functional modules from such networks is crucial for understanding principles of cellular organization and functions. However, protein interaction data produced by high-throughput experiments are generally associated with high false positives, which makes it difficult to identify functional modules accurately. In this paper, we propose a fast hierarchical clustering algorithm HC-PIN based on the local metric of edge clustering value which can be used both in the unweighted network and in the weighted network. The proposed algorithm HC-PIN is applied to the yeast protein interaction network, and the identified modules are validated by all the three types of Gene Ontology (GO) Terms: Biological Process, Molecular Function, and Cellular Component. The experimental results show that HC-PIN is not only robust to false positives, but also can discover the functional modules with low density. The identified modules are statistically significant in terms of three types of GO annotations. Moreover, HC-PIN can uncover the hierarchical organization of functional modules with the variation of its parameter's value, which is approximatively corresponding to the hierarchical structure of GO annotations. Compared to other previous competing algorithms, our algorithm HC-PIN is faster and more accurate.
Collapse
Affiliation(s)
- Jianxin Wang
- Department of Computer Science, School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | | | | | | |
Collapse
|
39
|
Abstract
The increasing availability of large-scale protein-protein interaction data has made it possible to understand the basic components and organization of cell machinery from the network level. The arising challenge is how to analyze such complex interacting data to reveal the principles of cellular organization, processes and functions. Many studies have shown that clustering protein interaction network is an effective approach for identifying protein complexes or functional modules, which has become a major research topic in systems biology. In this review, recent advances in clustering methods for protein interaction networks will be presented in detail. The predictions of protein functions and interactions based on modules will be covered. Finally, the performance of different clustering methods will be compared and the directions for future research will be discussed.
Collapse
Affiliation(s)
- Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha 410083, China
- Department of Computer Science, Georgia State University, Atlanta, GA30303, USA
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha 410083, China
| | - Youping Deng
- Rush University Cancer Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Yi Pan
- Department of Computer Science, Georgia State University, Atlanta, GA30303, USA
| |
Collapse
|
40
|
Wang J, Liu B, Li M, Pan Y. Identifying protein complexes from interaction networks based on clique percolation and distance restriction. BMC Genomics 2010; 11 Suppl 2:S10. [PMID: 21047377 PMCID: PMC2975417 DOI: 10.1186/1471-2164-11-s2-s10] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of protein complexes in large interaction networks is crucial to understand principles of cellular organization and predict protein functions, which is one of the most important issues in the post-genomic era. Each protein might be subordinate multiple protein complexes in the real protein-protein interaction networks. Identifying overlapping protein complexes from protein-protein interaction networks is a considerable research topic. RESULT As an effective algorithm in identifying overlapping module structures, clique percolation method (CPM) has a wide range of application in social networks and biological networks. However, the recognition accuracy of algorithm CPM is lowly. Furthermore, algorithm CPM is unfit to identifying protein complexes with meso-scale when it applied in protein-protein interaction networks. In this paper, we propose a new topological model by extending the definition of k-clique community of algorithm CPM and introduced distance restriction, and develop a novel algorithm called CP-DR based on the new topological model for identifying protein complexes. In this new algorithm, the protein complex size is restricted by distance constraint to conquer the shortcomings of algorithm CPM. The algorithm CP-DR is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. CONCLUSION The proposed algorithm CP-DR based on clique percolation and distance restriction makes it possible to identify dense subgraphs in protein interaction networks, a large number of which correspond to known protein complexes. Compared to algorithm CPM, algorithm CP-DR has more outstanding performance.
Collapse
Affiliation(s)
- Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha 410083, China.
| | | | | | | |
Collapse
|
41
|
Zhang SH, Wu C, Li X, Chen X, Jiang W, Gong BS, Li J, Yan YQ. From phenotype to gene: Detecting disease-specific gene functional modules via a text-based human disease phenotype network construction. FEBS Lett 2010; 584:3635-43. [DOI: 10.1016/j.febslet.2010.07.038] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2010] [Revised: 07/17/2010] [Accepted: 07/21/2010] [Indexed: 10/19/2022]
|
42
|
Li X, Wu M, Kwoh CK, Ng SK. Computational approaches for detecting protein complexes from protein interaction networks: a survey. BMC Genomics 2010; 11 Suppl 1:S3. [PMID: 20158874 PMCID: PMC2822531 DOI: 10.1186/1471-2164-11-s1-s3] [Citation(s) in RCA: 167] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
Background Most proteins form macromolecular complexes to perform their biological functions. However, experimentally determined protein complex data, especially of those involving more than two protein partners, are relatively limited in the current state-of-the-art high-throughput experimental techniques. Nevertheless, many techniques (such as yeast-two-hybrid) have enabled systematic screening of pairwise protein-protein interactions en masse. Thus computational approaches for detecting protein complexes from protein interaction data are useful complements to the limited experimental methods. They can be used together with the experimental methods for mapping the interactions of proteins to understand how different proteins are organized into higher-level substructures to perform various cellular functions. Results Given the abundance of pairwise protein interaction data from high-throughput genome-wide experimental screenings, a protein interaction network can be constructed from protein interaction data by considering individual proteins as the nodes, and the existence of a physical interaction between a pair of proteins as a link. This binary protein interaction graph can then be used for detecting protein complexes using graph clustering techniques. In this paper, we review and evaluate the state-of-the-art techniques for computational detection of protein complexes, and discuss some promising research directions in this field. Conclusions Experimental results with yeast protein interaction data show that the interaction subgraphs discovered by various computational methods matched well with actual protein complexes. In addition, the computational approaches have also improved in performance over the years. Further improvements could be achieved if the quality of the underlying protein interaction data can be considered adequately to minimize the undesirable effects from the irrelevant and noisy sources, and the various biological evidences can be better incorporated into the detection process to maximize the exploitation of the increasing wealth of biological knowledge available.
Collapse
Affiliation(s)
- Xiaoli Li
- Institute for Infocomm Research, 1 Fusionopolis Way, Singapore.
| | | | | | | |
Collapse
|
43
|
Pinkert S, Schultz J, Reichardt J. Protein interaction networks--more than mere modules. PLoS Comput Biol 2010; 6:e1000659. [PMID: 20126533 PMCID: PMC2813263 DOI: 10.1371/journal.pcbi.1000659] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2008] [Accepted: 12/22/2009] [Indexed: 11/26/2022] Open
Abstract
It is widely believed that the modular organization of cellular function is reflected in a modular structure of molecular networks. A common view is that a “module” in a network is a cohesively linked group of nodes, densely connected internally and sparsely interacting with the rest of the network. Many algorithms try to identify functional modules in protein-interaction networks (PIN) by searching for such cohesive groups of proteins. Here, we present an alternative approach independent of any prior definition of what actually constitutes a “module”. In a self-consistent manner, proteins are grouped into “functional roles” if they interact in similar ways with other proteins according to their functional roles. Such grouping may well result in cohesive modules again, but only if the network structure actually supports this. We applied our method to the PIN from the Human Protein Reference Database (HPRD) and found that a representation of the network in terms of cohesive modules, at least on a global scale, does not optimally represent the network's structure because it focuses on finding independent groups of proteins. In contrast, a decomposition into functional roles is able to depict the structure much better as it also takes into account the interdependencies between roles and even allows groupings based on the absence of interactions between proteins in the same functional role. This, for example, is the case for transmembrane proteins, which could never be recognized as a cohesive group of nodes in a PIN. When mapping experimental methods onto the groups, we identified profound differences in the coverage suggesting that our method is able to capture experimental bias in the data, too. For example yeast-two-hybrid data were highly overrepresented in one particular group. Thus, there is more structure in protein-interaction networks than cohesive modules alone and we believe this finding can significantly improve automated function prediction algorithms. Cellular function is widely believed to be organized in a modular fashion. On all scales and at all levels of complexity, relatively independent sub-units perform relatively independent sub-tasks. This functional modularity must be reflected in the topology of molecular networks. But how a functional module should be represented in an interaction network is an open question. On a small scale, one can identify a protein-complex as a module in protein-interaction networks (PIN), i.e., modules are understood as densely linked (interacting) groups of proteins, that are only sparsely interacting with the rest of the network. In this contribution, we show that extrapolating this concept of cohesively linked clusters of proteins as modules to the scale of the entire PIN inevitably misses important and functionally relevant structure inherent in the network. As an alternative, we introduce a novel way of decomposing a network into functional roles and show that this represents network structure and function more efficiently. This finding should have a profound impact on all module assisted methods of protein function prediction and should shed new light on how functional modules can be represented in molecular interaction networks in general.
Collapse
Affiliation(s)
- Stefan Pinkert
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
- Department of Cellular Biochemistry, Max Planck Institute of Biochemistry, Martinsried, Germany
| | - Jörg Schultz
- Department of Bioinformatics, Biocenter, University of Würzburg, Würzburg, Germany
| | - Jörg Reichardt
- Institute for Theoretical Physics and Astrophysics, University of Würzburg, Würzburg, Germany
- Complexity Sciences Center, University of California at Davis, Davis, California, United States of America
- * E-mail:
| |
Collapse
|
44
|
Wu Z, Zhao X, Chen L. Identifying responsive functional modules from protein-protein interaction network. Mol Cells 2009; 27:271-7. [PMID: 19326072 DOI: 10.1007/s10059-009-0035-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 01/26/2009] [Indexed: 10/21/2022] Open
Abstract
Proteins interact with each other within a cell, and those interactions give rise to the biological function and dynamical behavior of cellular systems. Generally, the protein interactions are temporal, spatial, or condition dependent in a specific cell, where only a small part of interactions usually take place under certain conditions. Recently, although a large amount of protein interaction data have been collected by high-throughput technologies, the interactions are recorded or summarized under various or different conditions and therefore cannot be directly used to identify signaling pathways or active networks, which are believed to work in specific cells under specific conditions. However, protein interactions activated under specific conditions may give hints to the biological process underlying corresponding phenotypes. In particular, responsive functional modules consist of protein interactions activated under specific conditions can provide insight into the mechanism underlying biological systems, e.g. protein interaction subnetworks found for certain diseases rather than normal conditions may help to discover potential biomarkers. From computational viewpoint, identifying responsive functional modules can be formulated as an optimization problem. Therefore, efficient computational methods for extracting responsive functional modules are strongly demanded due to the NP-hard nature of such a combinatorial problem. In this review, we first report recent advances in development of computational methods for extracting responsive functional modules or active pathways from protein interaction network and microarray data. Then from computational aspect, we discuss remaining obstacles and perspectives for this attractive and challenging topic in the area of systems biology.
Collapse
Affiliation(s)
- Zikai Wu
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
| | | | | |
Collapse
|
45
|
Li M, Chen JE, Wang JX, Hu B, Chen G. Modifying the DPClus algorithm for identifying protein complexes based on new topological structures. BMC Bioinformatics 2008; 9:398. [PMID: 18816408 PMCID: PMC2570695 DOI: 10.1186/1471-2105-9-398] [Citation(s) in RCA: 137] [Impact Index Per Article: 8.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2008] [Accepted: 09/25/2008] [Indexed: 11/29/2022] Open
Abstract
Background Identification of protein complexes is crucial for understanding principles of cellular organization and functions. As the size of protein-protein interaction set increases, a general trend is to represent the interactions as a network and to develop effective algorithms to detect significant complexes in such networks. Results Based on the study of known complexes in protein networks, this paper proposes a new topological structure for protein complexes, which is a combination of subgraph diameter (or average vertex distance) and subgraph density. Following the approach of that of the previously proposed clustering algorithm DPClus which expands clusters starting from seeded vertices, we present a clustering algorithm IPCA based on the new topological structure for identifying complexes in large protein interaction networks. The algorithm IPCA is applied to the protein interaction network of Sacchromyces cerevisiae and identifies many well known complexes. Experimental results show that the algorithm IPCA recalls more known complexes than previously proposed clustering algorithms, including DPClus, CFinder, LCMA, MCODE, RNSC and STM. Conclusion The proposed algorithm based on the new topological structure makes it possible to identify dense subgraphs in protein interaction networks, many of which correspond to known protein complexes. The algorithm is robust to the known high rate of false positives and false negatives in data from high-throughout interaction techniques. The program is available at .
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha, Hunan 410083, PR China.
| | | | | | | | | |
Collapse
|
46
|
Hwang W, Cho YR, Zhang A, Ramanathan M. CASCADE: a novel quasi all paths-based network analysis algorithm for clustering biological interactions. BMC Bioinformatics 2008; 9:64. [PMID: 18230159 PMCID: PMC2253513 DOI: 10.1186/1471-2105-9-64] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/09/2007] [Accepted: 01/29/2008] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Quantitative characterization of the topological characteristics of protein-protein interaction (PPI) networks can enable the elucidation of biological functional modules. Here, we present a novel clustering methodology for PPI networks wherein the biological and topological influence of each protein on other proteins is modeled using the probability distribution that the series of interactions necessary to link a pair of distant proteins in the network occur within a time constant (the occurrence probability). RESULTS CASCADE selects representative nodes for each cluster and iteratively refines clusters based on a combination of the occurrence probability and graph topology between every protein pair. The CASCADE approach is compared to nine competing approaches. The clusters obtained by each technique are compared for enrichment of biological function. CASCADE generates larger clusters and the clusters identified have p-values for biological function that are approximately 1000-fold better than the other methods on the yeast PPI network dataset. An important strength of CASCADE is that the percentage of proteins that are discarded to create clusters is much lower than the other approaches which have an average discard rate of 45% on the yeast protein-protein interaction network. CONCLUSION CASCADE is effective at detecting biologically relevant clusters of interactions.
Collapse
Affiliation(s)
- Woochang Hwang
- Department of Computer Science and Engineering, State University of New York, Buffalo, NY 14260, USA.
| | | | | | | |
Collapse
|
47
|
Wang ZH, Liu QJ, Zhu YP. [Research on modular organization of gene regulatory network]. YI CHUAN = HEREDITAS 2008; 30:20-27. [PMID: 18244898 DOI: 10.3724/sp.j.1005.2008.00020] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/25/2023]
Abstract
Gene regulatory network is very important for researchers to understand biological processes and gene functions. It can deliver complex information about how could large amount of genes be regulated by transcriptional factors and translated into proteins which can carried out biological functions. Generally, knowledge of network topological structure and organization formation can be used to find the regulatory mechanism of genes in the regulatory network. It can illuminate the local characters of the network and reveal the constructing methods of regulatory network; moreover, it can also analyze regulatory pathway completely and systemically. Now, more and more researchers approbate the hierarchy structure of gene regulatory network: regulatory component, Motif, module and the whole network. Here, we discuss the middle two levels: motif and module. We compared various research results of network organization carried out in recent years, explicated their biology signification and pointed out the existing disadvantages and problems. According these problems, we also bring up some possible research trend. And at last, we discuss the prospect of gene regulatory network modular organization researching work.
Collapse
Affiliation(s)
- Zheng-Hua Wang
- National Laboratory For Parallel & Distributed Processing, National University of Deference and Technology, Changsha 410073, China.
| | | | | |
Collapse
|
48
|
Abstract
Interaction networks, consisting of agents linked by their interactions, are ubiquitous across many disciplines of modern science. Many methods of analysis of interaction networks have been proposed, mainly concentrating on node degree distribution or aiming to discover clusters of agents that are very strongly connected between themselves. These methods are principally based on graph-theory or machine learning. We present a mathematically simple formalism for modelling context-specific information propagation in interaction networks based on random walks. The context is provided by selection of sources and destinations of information and by use of potential functions that direct the flow towards the destinations. We also use the concept of dissipation to model the aging of information as it diffuses from its source. Using examples from yeast protein-protein interaction networks and some of the histone acetyltransferases involved in control of transcription, we demonstrate the utility of the concepts and the mathematical constructs introduced in this paper.
Collapse
Affiliation(s)
- Aleksandar Stojmirović
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | | |
Collapse
|
49
|
Huang Y, Li H, Hu H, Yan X, Waterman MS, Huang H, Zhou XJ. Systematic discovery of functional modules and context-specific functional annotation of human genome. ACTA ACUST UNITED AC 2007; 23:i222-9. [PMID: 17646300 DOI: 10.1093/bioinformatics/btm222] [Citation(s) in RCA: 45] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023]
Abstract
MOTIVATION The rapid accumulation of microarray datasets provides unique opportunities to perform systematic functional characterization of the human genome. We designed a graph-based approach to integrate cross-platform microarray data, and extract recurrent expression patterns. A series of microarray datasets can be modeled as a series of co-expression networks, in which we search for frequently occurring network patterns. The integrative approach provides three major advantages over the commonly used microarray analysis methods: (1) enhance signal to noise separation (2) identify functionally related genes without co-expression and (3) provide a way to predict gene functions in a context-specific way. RESULTS We integrate 65 human microarray datasets, comprising 1105 experiments and over 11 million expression measurements. We develop a data mining procedure based on frequent itemset mining and biclustering to systematically discover network patterns that recur in at least five datasets. This resulted in 143,401 potential functional modules. Subsequently, we design a network topology statistic based on graph random walk that effectively captures characteristics of a gene's local functional environment. Function annotations based on this statistic are then subject to the assessment using the random forest method, combining six other attributes of the network modules. We assign 1126 functions to 895 genes, 779 known and 116 unknown, with a validation accuracy of 70%. Among our assignments, 20% genes are assigned with multiple functions based on different network environments. AVAILABILITY http://zhoulab.usc.edu/ContextAnnotation.
Collapse
Affiliation(s)
- Yu Huang
- Molecuolar and Computational Biology, University of Southern California, Los Angeles, CA, USA
| | | | | | | | | | | | | |
Collapse
|
50
|
Cho YR, Hwang W, Ramanathan M, Zhang A. Semantic integration to identify overlapping functional modules in protein interaction networks. BMC Bioinformatics 2007; 8:265. [PMID: 17650343 PMCID: PMC1971074 DOI: 10.1186/1471-2105-8-265] [Citation(s) in RCA: 119] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2007] [Accepted: 07/24/2007] [Indexed: 12/05/2022] Open
Abstract
Background The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. Results We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. Conclusion The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification.
Collapse
Affiliation(s)
- Young-Rae Cho
- Department of Computer Science and Engineering, State University of New York, Buffalo, NY, USA
| | - Woochang Hwang
- Department of Computer Science and Engineering, State University of New York, Buffalo, NY, USA
| | - Murali Ramanathan
- Department of Pharmaceutical Science, State University of New York, Buffalo, NY, USA
| | - Aidong Zhang
- Department of Computer Science and Engineering, State University of New York, Buffalo, NY, USA
| |
Collapse
|