1
|
Xing Z, Zhang Y, Tian Z, Wang M, Xiao W, Zhu C, Zhao S, Zhu Y, Hu L, Kong X. Escaping but not the inactive X-linked protein complex coding genes may achieve X-chromosome dosage compensation and underlie X chromosome inactivation-related diseases. Heliyon 2023; 9:e17721. [PMID: 37449161 PMCID: PMC10336589 DOI: 10.1016/j.heliyon.2023.e17721] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2022] [Revised: 06/05/2023] [Accepted: 06/26/2023] [Indexed: 07/18/2023] Open
Abstract
X chromosome dosage compensation (XDC) refers to the process by which X-linked genes acquire expression equivalence between two sexes. Ohno proposed that XDC is achieved by two-fold upregulations of X-linked genes in both sexes and by silencing one X chromosome (X chromosome inactivation, XCI) in females. However, genes subject to two-fold upregulations as well as the underlying mechanism remain unclear. It's reported that gene dosage changes may only affect X-linked dosage-sensitive genes, such as protein complex coding genes (PCGs). Our results showed that in human PCGs are more likely to escape XCI and escaping PCGs (EsP) show two-fold higher expression than inactivated PCGs (InP) or other X-linked genes at RNA and protein levels in both sexes, which suggest that EsP may achieve upregulations and XDC. The higher expressions of EsP possibly result from the upregulations of the single active X chromosome (Xa), rather than escaping expressions from the inactive X chromosome (Xi). EsP genes have relatively high expression levels in humans and lower dN/dS ratios, suggesting that they are likely under stronger selection pressure over evolutionary time. Our study also suggests that SP1 transcription factor is significantly enriched in EsP and may be involved in the up-regulations of EsP on the active X. Finally, human EsP genes in this study are enriched in the toll-like receptor pathway, NF-kB pathway, apoptotic pathway, and abnormal mental, developmental and reproductive phenotypes. These findings suggest misregulations of EsP may be involved in autoimmune, reproductive, and neurological diseases, providing insight for the diagnosis and treatment of these diseases.
Collapse
Affiliation(s)
- Zhihao Xing
- Clinical Laboratory, Institute of Pediatrics, Shenzhen Children’s' Hospital, Shenzhen, China
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Yuchao Zhang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Zhongyuan Tian
- Zhoukou Traditional Chinese Medicine Hospital, Zhoukou, Henan, China
| | - Meng Wang
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Weiwei Xiao
- Clinical Laboratory, Institute of Pediatrics, Shenzhen Children’s' Hospital, Shenzhen, China
| | - Chunqing Zhu
- Clinical Laboratory, Institute of Pediatrics, Shenzhen Children’s' Hospital, Shenzhen, China
| | - Songhui Zhao
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Yufei Zhu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Landian Hu
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| | - Xiangyin Kong
- CAS Key Laboratory of Tissue Microenvironment and Tumor, Shanghai Institute of Nutrition and Health, University of Chinese Academy of Sciences, Chinese Academy of Sciences, China
| |
Collapse
|
2
|
Miller RA, Kutmon M, Bohler A, Waagmeester A, Evelo CT, Willighagen EL. Understanding signaling and metabolic paths using semantified and harmonized information about biological interactions. PLoS One 2022; 17:e0263057. [PMID: 35436299 PMCID: PMC9015122 DOI: 10.1371/journal.pone.0263057] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2021] [Accepted: 01/11/2022] [Indexed: 11/22/2022] Open
Abstract
To grasp the complexity of biological processes, the biological knowledge is often translated into schematic diagrams of, for example, signalling and metabolic pathways. These pathway diagrams describe relevant connections between biological entities and incorporate domain knowledge in a visual format making it easier for humans to interpret. Still, these diagrams can be represented in machine readable formats, as done in the KEGG, Reactome, and WikiPathways databases. However, while humans are good at interpreting the message of the creators of diagrams, algorithms struggle when the diversity in drawing approaches increases. WikiPathways supports multiple drawing styles which need harmonizing to offer semantically enriched access. Particularly challenging, here, are the interactions between the biological entities that underlie the biological causality. These interactions provide information about the biological process (metabolic conversion, inhibition, etc.), the direction, and the participating entities. Availability of the interactions in a semantic and harmonized format is essential for searching the full network of biological interactions. We here study how the graphically-modelled biological knowledge in diagrams can be semantified and harmonized, and exemplify how the resulting data is used to programmatically answer biological questions. We find that we can translate graphically modelled knowledge to a sufficient degree into a semantic model and discuss some of the current limitations. We then use this to show that reproducible notebooks can be used to explore up- and downstream targets of MECP2 and to analyse the sphingolipid metabolism. Our results demonstrate that most of the graphical biological knowledge from WikiPathways is modelled into the semantic layer with the semantic information intact and connectivity information preserved. Being able to evaluate how biological elements affect each other is useful and allows, for example, the identification of up or downstream targets that will have a similar effect when modified.
Collapse
Affiliation(s)
- Ryan A. Miller
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- * E-mail:
| | - Martina Kutmon
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Anwesha Bohler
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
| | - Andra Waagmeester
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Micellio, Antwerp, Belgium
| | - Chris T. Evelo
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
- Maastricht Centre for Systems Biology (MaCSBio), Maastricht University, Maastricht, The Netherlands
| | - Egon L. Willighagen
- Department of Bioinformatics (BiGCaT), NUTRIM, Maastricht University, Maastricht, The Netherlands
| |
Collapse
|
3
|
APAL: Adjacency Propagation Algorithm for overlapping community detection in biological networks. Inf Sci (N Y) 2021. [DOI: 10.1016/j.ins.2021.08.031] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
4
|
Jeong H, Kim Y, Jung YS, Kang DR, Cho YR. Entropy-Based Graph Clustering of PPI Networks for Predicting Overlapping Functional Modules of Proteins. ENTROPY 2021; 23:e23101271. [PMID: 34681995 PMCID: PMC8534328 DOI: 10.3390/e23101271] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 08/10/2021] [Revised: 09/25/2021] [Accepted: 09/25/2021] [Indexed: 12/26/2022]
Abstract
Functional modules can be predicted using genome-wide protein-protein interactions (PPIs) from a systematic perspective. Various graph clustering algorithms have been applied to PPI networks for this task. In particular, the detection of overlapping clusters is necessary because a protein is involved in multiple functions under different conditions. graph entropy (GE) is a novel metric to assess the quality of clusters in a large, complex network. In this study, the unweighted and weighted GE algorithm is evaluated to prove the validity of predicting function modules. To measure clustering accuracy, the clustering results are compared to protein complexes and Gene Ontology (GO) annotations as references. We demonstrate that the GE algorithm is more accurate in overlapping clusters than the other competitive methods. Moreover, we confirm the biological feasibility of the proteins that occur most frequently in the set of identified clusters. Finally, novel proteins for the additional annotation of GO terms are revealed.
Collapse
Affiliation(s)
- Hoyeon Jeong
- Department of Biostatistics, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea; (H.J.); (D.R.K.)
- National Health Big Data Clinical Research Institute, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea
| | - Yoonbee Kim
- Division of Software, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Korea; (Y.K.); (Y.-S.J.)
| | - Yi-Sue Jung
- Division of Software, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Korea; (Y.K.); (Y.-S.J.)
| | - Dae Ryong Kang
- Department of Biostatistics, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea; (H.J.); (D.R.K.)
- National Health Big Data Clinical Research Institute, Wonju College of Medicine, Yonsei University, Wonju-si 26426, Gangwon-do, Korea
| | - Young-Rae Cho
- Division of Software, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Korea; (Y.K.); (Y.-S.J.)
- Division of Digital Healthcare, Yonsei University Mirae Campus, Wonju-si 26493, Gangwon-do, Korea
- Correspondence: ; Tel.: +82-33-760-2245
| |
Collapse
|
5
|
Liu G, Liu B, Li A, Wang X, Yu J, Zhou X. Identifying Protein Complexes With Clear Module Structure Using Pairwise Constraints in Protein Interaction Networks. Front Genet 2021; 12:664786. [PMID: 34512712 PMCID: PMC8430217 DOI: 10.3389/fgene.2021.664786] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Accepted: 06/23/2021] [Indexed: 02/02/2023] Open
Abstract
The protein-protein interaction (PPI) networks can be regarded as powerful platforms to elucidate the principle and mechanism of cellular organization. Uncovering protein complexes from PPI networks will lead to a better understanding of the science of biological function in cellular systems. In recent decades, numerous computational algorithms have been developed to identify protein complexes. However, the majority of them primarily concern the topological structure of PPI networks and lack of the consideration for the native organized structure among protein complexes. The PPI networks generated by high-throughput technology include a fraction of false protein interactions which make it difficult to identify protein complexes efficiently. To tackle these challenges, we propose a novel semi-supervised protein complex detection model based on non-negative matrix tri-factorization, which not only considers topological structure of a PPI network but also makes full use of available high quality known protein pairs with must-link constraints. We propose non-overlapping (NSSNMTF) and overlapping (OSSNMTF) protein complex detection algorithms to identify the significant protein complexes with clear module structures from PPI networks. In addition, the proposed two protein complex detection algorithms outperform a diverse range of state-of-the-art protein complex identification algorithms on both synthetic networks and human related PPI networks.
Collapse
Affiliation(s)
- Guangming Liu
- School of Computer Science & Engineering, Xi'an University of Technology, Xi'an, China
| | - Bo Liu
- Hebei Key Laboratory of Agricultural Big Data, College of Information Science and Technology, Hebei Agricultural University, Baoding, China
| | - Aimin Li
- School of Computer Science & Engineering, Xi'an University of Technology, Xi'an, China
| | - Xiaofan Wang
- School of Computer Science & Engineering, Xi'an University of Technology, Xi'an, China
| | - Jian Yu
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| | - Xuezhong Zhou
- Beijing Key Lab of Traffic Data Analysis and Mining, School of Computer and Information Technology, Beijing Jiaotong University, Beijing, China
| |
Collapse
|
6
|
Zhou Y, Chen H, Li S, Chen M. mPPI: a database extension to visualize structural interactome in a one-to-many manner. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION 2021; 2021:6307707. [PMID: 34156447 DOI: 10.1093/database/baab036] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/26/2021] [Revised: 05/10/2021] [Accepted: 05/28/2021] [Indexed: 01/02/2023]
Abstract
Protein-protein interaction (PPI) databases with structural information are useful to investigate biological functions at both systematic and atomic levels. However, most existing PPI databases only curate binary interactome. From the perspective of the display and function of PPI, as well as the structural binding interface, the related database and resources are summarized. We developed a database extension, named mPPI, for PPI structural visualization. Comparing with the existing structural interactomes that curate resolved PPI conformation in pairs, mPPI can visualize target protein and its multiple interactors simultaneously, which facilitates multi-target drug discovery and structure prediction of protein macro-complexes. By employing a protein-protein docking algorithm, mPPI largely extends the coverage of structural interactome from experimentally resolved complexes. mPPI is designed to be a customizable and convenient plugin for PPI databases. It possesses wide potential applications for various PPI databases, and it has been used for a neurodegenerative disease-related PPI database as demonstration. Scripts and implementation guidelines of mPPI are documented at the database tool website. Database URL http://bis.zju.edu.cn/mppi/.
Collapse
Affiliation(s)
- Yekai Zhou
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China.,Department of Computer Science, The University of Hong Kong, Hong Kong 999077, China
| | - Hongjun Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Sida Li
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department of Bioinformatics, College of Life Sciences, Zhejiang University, Hangzhou 310058, China.,Bioinformatics Center, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
7
|
CMG2Vec: A composite meta-graph based heterogeneous information network embedding approach. Knowl Based Syst 2021. [DOI: 10.1016/j.knosys.2020.106661] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
8
|
Cheng L, Zeng Y, Hu S, Zhang N, Cheung KCP, Li B, Leung KS, Jiang L. Systematic prediction of autophagy-related proteins using Arabidopsis thaliana interactome data. THE PLANT JOURNAL : FOR CELL AND MOLECULAR BIOLOGY 2021; 105:708-720. [PMID: 33128829 DOI: 10.1111/tpj.15065] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 10/09/2020] [Accepted: 10/21/2020] [Indexed: 06/11/2023]
Abstract
Autophagy is a self-degradative process that is crucial for maintaining cellular homeostasis by removing damaged cytoplasmic components and recycling nutrients. Such an evolutionary conserved proteolysis process is regulated by the autophagy-related (Atg) proteins. The incomplete understanding of plant autophagy proteome and the importance of a proteome-wide understanding of the autophagy pathway prompted us to predict Atg proteins and regulators in Arabidopsis. Here, we developed a systems-level algorithm to identify autophagy-related modules (ARMs) based on protein subcellular localization, protein-protein interactions, and known Atg proteins. This generates a detailed landscape of the autophagic modules in Arabidopsis. We found that the newly identified genes in each ARM tend to be upregulated and coexpressed during the senescence stage of Arabidopsis. We also demonstrated that the Golgi apparatus ARM, ARM13, functions in the autophagy process by module clustering and functional analysis. To verify the in silico analysis, the Atg candidates in ARM13 that are functionally similar to the core Atg proteins were selected for experimental validation. Interestingly, two of the previously uncharacterized proteins identified from the ARM analysis, AGD1 and Sec14, exhibited bona fide association with the autophagy protein complex in plant cells, which provides evidence for a cross-talk between intracellular pathways and autophagy. Thus, the computational framework has facilitated the identification and characterization of plant-specific autophagy-related proteins and novel autophagy proteins/regulators in higher eukaryotes.
Collapse
Affiliation(s)
- Lixin Cheng
- School of Life Sciences, Centre for Cell & Developmental Biology and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Yonglun Zeng
- School of Life Sciences, Centre for Cell & Developmental Biology and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Shuai Hu
- School of Life Sciences, Centre for Cell & Developmental Biology and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Ning Zhang
- Shenzhen People's Hospital, First Affiliated Hospital of Southern University of Science and Technology, Second Clinical Medicine College of Jinan University, Shenzhen, China
| | - Kenneth C P Cheung
- School of Life Sciences, Centre for Cell & Developmental Biology and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Baiying Li
- School of Life Sciences, Centre for Cell & Developmental Biology and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Kwong-Sak Leung
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
| | - Liwen Jiang
- School of Life Sciences, Centre for Cell & Developmental Biology and State Key Laboratory of Agrobiotechnology, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, China
- CUHK Shenzhen Research Institute, Shenzhen, China
| |
Collapse
|
9
|
HFADE-FMD: a hybrid approach of fireworks algorithm and differential evolution strategies for functional module detection in protein-protein interaction networks. APPL INTELL 2021. [DOI: 10.1007/s10489-020-01791-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023]
|
10
|
Li M, Meng X, Zheng R, Wu FX, Li Y, Pan Y, Wang J. Identification of Protein Complexes by Using a Spatial and Temporal Active Protein Interaction Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:817-827. [PMID: 28885159 DOI: 10.1109/tcbb.2017.2749571] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]
Abstract
The rapid development of proteomics and high-throughput technologies has produced a large amount of Protein-Protein Interaction (PPI) data, which makes it possible for considering dynamic properties of protein interaction networks (PINs) instead of static properties. Identification of protein complexes from dynamic PINs becomes a vital scientific problem for understanding cellular life in the post genome era. Up to now, plenty of models or methods have been proposed for the construction of dynamic PINs to identify protein complexes. However, most of the constructed dynamic PINs just focus on the temporal dynamic information and thus overlook the spatial dynamic information of the complex biological systems. To address the limitation of the existing dynamic PIN analysis approaches, in this paper, we propose a new model-based scheme for the construction of the Spatial and Temporal Active Protein Interaction Network (ST-APIN) by integrating time-course gene expression data and subcellular location information. To evaluate the efficiency of ST-APIN, the commonly used classical clustering algorithm MCL is adopted to identify protein complexes from ST-APIN and the other three dynamic PINs, NF-APIN, DPIN, and TC-PIN. The experimental results show that, the performance of MCL on ST-APIN outperforms those on the other three dynamic PINs in terms of matching with known complexes, sensitivity, specificity, and f-measure. Furthermore, we evaluate the identified protein complexes by Gene Ontology (GO) function enrichment analysis. The validation shows that the identified protein complexes from ST-APIN are more biologically significant. This study provides a general paradigm for constructing the ST-APINs, which is essential for further understanding of molecular systems and the biomedical mechanism of complex diseases.
Collapse
|
11
|
Liu X, Xu Y, Wang R, Liu S, Wang J, Luo Y, Leung KS, Cheng L. A network-based algorithm for the identification of moonlighting noncoding RNAs and its application in sepsis. Brief Bioinform 2020; 22:581-588. [PMID: 32003790 DOI: 10.1093/bib/bbz154] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2019] [Revised: 10/26/2019] [Accepted: 11/01/2019] [Indexed: 12/26/2022] Open
Abstract
Moonlighting proteins provide more options for cells to execute multiple functions without increasing the genome and transcriptome complexity. Although there have long been calls for computational methods for the prediction of moonlighting proteins, no method has been designed for determining moonlighting long noncoding ribonucleicacidz (RNAs) (mlncRNAs). Previously, we developed an algorithm MoonFinder for the identification of mlncRNAs at the genome level based on the functional annotation and interactome data of lncRNAs and proteins. Here, we update MoonFinder to MoonFinder v2.0 by providing an extensive framework for the detection of protein modules and the establishment of RNA-module associations in human. A novel measure, moonlighting coefficient, was also proposed to assess the confidence of an ncRNA acting in a moonlighting manner. Moreover, we explored the expression characteristics of mlncRNAs in sepsis, in which we found that mlncRNAs tend to be upregulated and differentially expressed. Interestingly, the mlncRNAs are mutually exclusive in terms of coexpression when compared to the other lncRNAs. Overall, MoonFinder v2.0 is dedicated to the prediction of human mlncRNAs and thus bears great promise to serve as a valuable R package for worldwide research communities (https://cran.r-project.org/web/packages/MoonFinder/index.html). Also, our analyses provide the first attempt to characterize mlncRNA expression and coexpression properties in adult sepsis patients, which will facilitate the understanding of the interaction and expression patterns of mlncRNAs.
Collapse
Affiliation(s)
- Xueyan Liu
- Critical Care Medici at Shenzhen People's Hospital
| | | | - Ran Wang
- Computer Science at The Chinese University of Hong Kong
| | | | | | | | - Kwong-Sak Leung
- Computer Science at the Chinese University of Hong Kong, Hong Kong, China
| | - Lixin Cheng
- Bioinformatics at Shenzhen People's Hospital, China
| |
Collapse
|
12
|
Maskey S, Cho YR. LePrimAlign: local entropy-based alignment of PPI networks to predict conserved modules. BMC Genomics 2019; 20:964. [PMID: 31874635 PMCID: PMC6929407 DOI: 10.1186/s12864-019-6271-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open
Abstract
Background Cross-species analysis of protein-protein interaction (PPI) networks provides an effective means of detecting conserved interaction patterns. Identifying such conserved substructures between PPI networks of different species increases our understanding of the principles deriving evolution of cellular organizations and their functions in a system level. In recent years, network alignment techniques have been applied to genome-scale PPI networks to predict evolutionary conserved modules. Although a wide variety of network alignment algorithms have been introduced, developing a scalable local network alignment algorithm with high accuracy is still challenging. Results We present a novel pairwise local network alignment algorithm, called LePrimAlign, to predict conserved modules between PPI networks of three different species. The proposed algorithm exploits the results of a pairwise global alignment algorithm with many-to-many node mapping. It also applies the concept of graph entropy to detect initial cluster pairs from two networks. Finally, the initial clusters are expanded to increase the local alignment score that is formulated by a combination of intra-network and inter-network scores. The performance comparison with state-of-the-art approaches demonstrates that the proposed algorithm outperforms in terms of accuracy of identified protein complexes and quality of alignments. Conclusion The proposed method produces local network alignment of higher accuracy in predicting conserved modules even with large biological networks at a reduced computational cost.
Collapse
Affiliation(s)
- Sawal Maskey
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA
| | - Young-Rae Cho
- Department of Computer Science, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA. .,Bioinformatics Program, Baylor University, One Bear Place #97141, Waco, 76798, TX, USA.
| |
Collapse
|
13
|
McMillan EA, Kwon G, Clemenceau JR, Fisher KW, Vaden RM, Shaikh AF, Neilsen BK, Kelly D, Potts MB, Sung YJ, Mendiratta S, Hight SK, Lee Y, MacMillan JB, Lewis RE, Kim HS, White MA. A Genome-wide Functional Signature Ontology Map and Applications to Natural Product Mechanism of Action Discovery. Cell Chem Biol 2019; 26:1380-1392.e6. [PMID: 31378711 PMCID: PMC9161285 DOI: 10.1016/j.chembiol.2019.07.008] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Revised: 05/30/2019] [Accepted: 07/17/2019] [Indexed: 12/29/2022]
Abstract
Gene expression signature-based inference of functional connectivity within and between genetic perturbations, chemical perturbations, and disease status can lead to the development of actionable hypotheses for gene function, chemical modes of action, and disease treatment strategies. Here, we report a FuSiOn-based genome-wide integration of hypomorphic cellular phenotypes that enables functional annotation of gene network topology, assignment of mechanistic hypotheses to genes of unknown function, and detection of cooperativity among cell regulatory systems. Dovetailing genetic perturbation data with chemical perturbation phenotypes allowed simultaneous generation of mechanism of action hypotheses for thousands of uncharacterized natural products fractions (NPFs). The predicted mechanism of actions span a broad spectrum of cellular mechanisms, many of which are not currently recognized as "druggable." To enable use of FuSiOn as a hypothesis generation resource, all associations and analyses are available within an open source web-based GUI (http://fusion.yuhs.ac).
Collapse
Affiliation(s)
- Elizabeth A McMillan
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Gino Kwon
- Graduate Program for Nanomedical Science, Yonsei University, Seoul, Korea
| | - Jean R Clemenceau
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Kurt W Fisher
- Department of Pathology and Microbiology, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Rachel M Vaden
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Anam F Shaikh
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Beth K Neilsen
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - David Kelly
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68198, USA
| | - Malia B Potts
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yeo-Jin Sung
- Severance Biomedical Science Institute, Brain Korea 21 Plus Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - Saurabh Mendiratta
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Suzie K Hight
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Yunji Lee
- Severance Biomedical Science Institute, Brain Korea 21 Plus Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea
| | - John B MacMillan
- Department of Biochemistry, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of Chemistry and Biochemistry, University of California, Santa Cruz, Santa Cruz, CA 95064, USA.
| | - Robert E Lewis
- Eppley Institute for Research in Cancer and Allied Diseases, University of Nebraska Medical Center, Omaha, NE 68198, USA.
| | - Hyun Seok Kim
- Severance Biomedical Science Institute, Brain Korea 21 Plus Project for Medical Science, Yonsei University College of Medicine, Seoul, Korea; Graduate Program for Nanomedical Science, Yonsei University, Seoul, Korea.
| | - Michael A White
- Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| |
Collapse
|
14
|
Rani RR, Ramyachitra D, Brindhadevi A. Detection of dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach. Sci Rep 2019; 9:11106. [PMID: 31366992 PMCID: PMC6668483 DOI: 10.1038/s41598-019-47468-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Accepted: 07/11/2019] [Indexed: 11/19/2022] Open
Abstract
The accessibility of a huge amount of protein-protein interaction (PPI) data has allowed to do research on biological networks that reveal the structure of a protein complex, pathways and its cellular organization. A key demand in computational biology is to recognize the modular structure of such biological networks. The detection of protein complexes from the PPI network, is one of the most challenging and significant problems in the post-genomic era. In Bioinformatics, the frequently employed approach for clustering the networks is Markov Clustering (MCL). Many of the researches for protein complex detection were done on the static PPI network, which suffers from a few drawbacks. To resolve this problem, this paper proposes an approach to detect the dynamic protein complexes through Markov Clustering based on Elephant Herd Optimization Approach (DMCL-EHO). Initially, the proposed method divides the PPI network into a set of dynamic subnetworks under various time points by combining the gene expression data and secondly, it employs the clustering analysis on every subnetwork using the MCL along with Elephant Herd Optimization approach. The experimental analysis was employed on different PPI network datasets and the proposed method surpasses various existing approaches in terms of accuracy measures. This paper identifies the common protein complexes that are expressively enriched in gold-standard datasets and also the pathway annotations of the detected protein complexes using the KEGG database.
Collapse
Affiliation(s)
- R Ranjani Rani
- Department of Computer Science, Bharathiar University, Tamilnadu, India
| | - D Ramyachitra
- Department of Computer Science, Bharathiar University, Tamilnadu, India.
| | - A Brindhadevi
- Department of Computer Science, Bharathiar University, Tamilnadu, India
| |
Collapse
|
15
|
Zhou Y, Huang J, Li H, Sun H, Peng Y, Xu Y. A semantic-rich similarity measure in heterogeneous information networks. Knowl Based Syst 2018. [DOI: 10.1016/j.knosys.2018.05.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
|
16
|
Estrogen receptor-1 is a key regulator of HIV-1 latency that imparts gender-specific restrictions on the latent reservoir. Proc Natl Acad Sci U S A 2018; 115:E7795-E7804. [PMID: 30061382 PMCID: PMC6099847 DOI: 10.1073/pnas.1803468115] [Citation(s) in RCA: 108] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
The molecular mechanisms leading to the creation and maintenance of the latent HIV reservoir remain incompletely understood. Unbiased shRNA screens showed that the estrogen receptor acts as a potent repressor of proviral reactivation in T cells. Antagonists of ESR-1 activate latent HIV-1 proviruses while agonists, including β-estradiol, potently block HIV reactivation. Using a well-matched set of male and female donors, we found that ESR-1 plays an important role in regulating HIV transcription in both sexes. However, women are much more responsive to estrogen and appear to harbor smaller inducible RNA reservoirs. Accounting for the impact of estrogen on HIV viral reservoirs will therefore be critical for devising curative therapies for women, a group representing 51% of global HIV infections. Unbiased shRNA library screens revealed that the estrogen receptor-1 (ESR-1) is a key factor regulating HIV-1 latency. In both Jurkat T cells and a Th17 primary cell model for HIV-1 latency, selective estrogen receptor modulators (SERMs, i.e., fulvestrant, raloxifene, and tamoxifen) are weak proviral activators and sensitize cells to latency-reversing agents (LRAs) including low doses of TNF-α (an NF-κB inducer), the histone deacetylase inhibitor vorinostat (soruberoylanilide hydroxamic acid, SAHA), and IL-15. To probe the physiologic relevance of these observations, leukapheresis samples from a cohort of 12 well-matched reproductive-age women and men on fully suppressive antiretroviral therapy were evaluated by an assay measuring the production of spliced envelope (env) mRNA (the EDITS assay) by next-generation sequencing. The cells were activated by T cell receptor (TCR) stimulation, IL-15, or SAHA in the presence of either β-estradiol or an SERM. β-Estradiol potently inhibited TCR activation of HIV-1 transcription, while SERMs enhanced the activity of most LRAs. Although both sexes responded to SERMs and β-estradiol, females showed much higher levels of inhibition in response to the hormone and higher reactivity in response to ESR-1 modulators than males. Importantly, the total inducible RNA reservoir, as measured by the EDITS assay, was significantly smaller in the women than in the men. We conclude that concurrent exposure to estrogen is likely to limit the efficacy of viral emergence from latency and that ESR-1 is a pharmacologically attractive target that can be exploited in the design of therapeutic strategies for latency reversal.
Collapse
|
17
|
Sharma P, Bhattacharyya D, Kalita J. Detecting protein complexes based on a combination of topological and biological properties in protein-protein interaction network. J Genet Eng Biotechnol 2018; 16:217-226. [PMID: 30647725 PMCID: PMC6296571 DOI: 10.1016/j.jgeb.2017.11.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2017] [Revised: 11/01/2017] [Accepted: 11/17/2017] [Indexed: 01/04/2023]
Abstract
Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called CSC to detect protein complexes. The method is evaluated in terms of positive predictive value, sensitivity and accuracy using the datasets of the model organism, yeast and humans. CSC outperforms several other competing algorithms for both organisms. Further, we present a framework to establish the usefulness of CSC in analyzing the influence of a given disease gene in a complex topologically as well as biologically considering eight major association factors.
Collapse
Affiliation(s)
- Pooja Sharma
- Department of Computer Science & Engineering, Tezpur University Napaam, Tezpur 784028, Assam, India
| | - D.K. Bhattacharyya
- Department of Computer Science & Engineering, Tezpur University Napaam, Tezpur 784028, Assam, India
| | - J.K. Kalita
- Department of Computer Science, University of Colorado at Colorado, Springs, CO 80933-7150, USA
| |
Collapse
|
18
|
Cheng L, Liu P, Leung K. SMILE: a novel procedure for subcellular module identification with localisation expansion. IET Syst Biol 2018. [PMID: 29533218 PMCID: PMC8687326 DOI: 10.1049/iet-syb.2017.0085] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Computational clustering methods help identify functional modules in protein–protein interaction (PPI) network, in which proteins participate in the same biological pathways or specific functions. Subcellular localisation is crucial for proteins to implement biological functions and each compartment accommodates specific portions of the protein interaction structure. However, the importance of protein subcellular localisation is often neglected in the studies of module identification. In this study, the authors propose a novel procedure, subcellular module identification with localisation expansion (SMILE), to identify super modules that consist of several subcellular modules performing specific biological functions among cell compartments. These super modules identified by SMILE are more functionally diverse and have been verified to be more associated with known protein complexes and biological pathways compared with the modules identified from the global PPI networks in both the compartmentalised PPI and InWeb_InBioMap datasets. The authors’ results reveal that subcellular localisation is a principal feature of functional modules and offers important guidance in detecting biologically meaningful results.
Collapse
Affiliation(s)
- Lixin Cheng
- Department of Computer Science & EngineeringChinese University of Hong KongMa Liu ShuiHong Kong
| | - Pengfei Liu
- Department of Computer Science & EngineeringChinese University of Hong KongMa Liu ShuiHong Kong
| | - Kwong‐Sak Leung
- Department of Computer Science & EngineeringChinese University of Hong KongMa Liu ShuiHong Kong
| |
Collapse
|
19
|
Liu G, Chai B, Yang K, Yu J, Zhou X. Overlapping functional modules detection in PPI network with pair-wise constrained non-negative matrix tri-factorisation. IET Syst Biol 2018. [PMID: 29533217 PMCID: PMC8687432 DOI: 10.1049/iet-syb.2017.0084] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022] Open
Abstract
A large amount of available protein–protein interaction (PPI) data has been generated by high‐throughput experimental techniques. Uncovering functional modules from PPI networks will help us better understand the underlying mechanisms of cellular functions. Numerous computational algorithms have been designed to identify functional modules automatically in the past decades. However, most community detection methods (non‐overlapping or overlapping types) are unsupervised models, which cannot incorporate the well‐known protein complexes as a priori. The authors propose a novel semi‐supervised model named pairwise constrains nonnegative matrix tri‐factorisation (PCNMTF), which takes full advantage of the well‐known protein complexes to find overlapping functional modules based on protein module indicator matrix and module correlation matrix simultaneously from PPI networks. PCNMTF determinately models and learns the mixed module memberships of each protein by considering the correlation among modules simultaneously based on the non‐negative matrix tri‐factorisation. The experiment results on both synthetic and real‐world biological networks demonstrate that PCNMTF gains more precise functional modules than that of state‐of‐the‐art methods.
Collapse
Affiliation(s)
- Guangming Liu
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, No. 3 Shangyuancun Haidian District, Beijing, People's Republic of China
| | - Bianfang Chai
- Department of Information Engineering, Hebei GEO University, Shijiazhuang, People's Republic of China
| | - Kuo Yang
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, No. 3 Shangyuancun Haidian District, Beijing, People's Republic of China
| | - Jian Yu
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, No. 3 Shangyuancun Haidian District, Beijing, People's Republic of China
| | - Xuezhong Zhou
- Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, No. 3 Shangyuancun Haidian District, Beijing, People's Republic of China.
| |
Collapse
|
20
|
Abstract
Protein complexes are known to play a major role in controlling cellular activity in a living being. Identifying complexes from raw protein-protein interactions (PPIs) is an important area of research. Earlier work has been limited mostly to yeast and a few other model organisms. Such protein complex identification methods, when applied to large human PPIs often give poor performance. We introduce a novel method called ComFiR to detect such protein complexes and further rank diseased complexes based on a query disease. We have shown that it has better performance in identifying protein complexes from human PPI data. This method is evaluated in terms of positive predictive value, sensitivity and accuracy. We have introduced a ranking approach and showed its application on Alzheimer's disease.
Collapse
|
21
|
Hernandez C, Mella C, Navarro G, Olivera-Nappa A, Araya J. Protein complex prediction via dense subgraphs and false positive analysis. PLoS One 2017; 12:e0183460. [PMID: 28937982 PMCID: PMC5609739 DOI: 10.1371/journal.pone.0183460] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 08/04/2017] [Indexed: 01/04/2023] Open
Abstract
Many proteins work together with others in groups called complexes in order to achieve a specific function. Discovering protein complexes is important for understanding biological processes and predict protein functions in living organisms. Large-scale and throughput techniques have made possible to compile protein-protein interaction networks (PPI networks), which have been used in several computational approaches for detecting protein complexes. Those predictions might guide future biologic experimental research. Some approaches are topology-based, where highly connected proteins are predicted to be complexes; some propose different clustering algorithms using partitioning, overlaps among clusters for networks modeled with unweighted or weighted graphs; and others use density of clusters and information based on protein functionality. However, some schemes still require much processing time or the quality of their results can be improved. Furthermore, most of the results obtained with computational tools are not accompanied by an analysis of false positives. We propose an effective and efficient mining algorithm for discovering highly connected subgraphs, which is our base for defining protein complexes. Our representation is based on transforming the PPI network into a directed acyclic graph that reduces the number of represented edges and the search space for discovering subgraphs. Our approach considers weighted and unweighted PPI networks. We compare our best alternative using PPI networks from Saccharomyces cerevisiae (yeast) and Homo sapiens (human) with state-of-the-art approaches in terms of clustering, biological metrics and execution times, as well as three gold standards for yeast and two for human. Furthermore, we analyze false positive predicted complexes searching the PDBe (Protein Data Bank in Europe) database in order to identify matching protein complexes that have been purified and structurally characterized. Our analysis shows that more than 50 yeast protein complexes and more than 300 human protein complexes found to be false positives according to our prediction method, i.e., not described in the gold standard complex databases, in fact contain protein complexes that have been characterized structurally and documented in PDBe. We also found that some of these protein complexes have recently been classified as part of a Periodic Table of Protein Complexes. The latest version of our software is publicly available at http://doi.org/10.6084/m9.figshare.5297314.v1.
Collapse
Affiliation(s)
- Cecilia Hernandez
- Computer Science, University of Concepción, Concepción, Chile
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
- * E-mail:
| | - Carlos Mella
- Computer Science, University of Concepción, Concepción, Chile
| | - Gonzalo Navarro
- Center for Biotechnology and Bioengineering (CeBiB), Department of Computer Science, University of Chile, Santiago, Chile
| | - Alvaro Olivera-Nappa
- Center for Biotechnology and Bioengineering (CeBiB), Department of Chemical Engineering and Biotechnology, University of Chile, Santiago, Chile
| | - Jaime Araya
- Computer Science, University of Concepción, Concepción, Chile
| |
Collapse
|
22
|
Chang AYF, Liao BY. Recruitment of histone modifications to assist mRNA dosage maintenance after degeneration of cytosine DNA methylation during animal evolution. Genome Res 2017; 27:1513-1524. [PMID: 28720579 PMCID: PMC5580711 DOI: 10.1101/gr.221739.117] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2017] [Accepted: 07/05/2017] [Indexed: 12/24/2022]
Abstract
Following gene duplication, mRNA expression of the duplicated gene is reduced to maintain mRNA dosage. In mammals, this process is achieved with increased cytosine DNA methylation of the promoters of duplicated genes to suppress transcriptional initiation. However, not all animal species possess a full apparatus for cytosine DNA methylation. For such species, such as the roundworm (Caenorhabditis elegans, "worm" hereafter) or fruit fly (Drosophila melanogaster, "fly" hereafter), it is unclear how reduced expression of duplicated genes has been achieved evolutionarily. Here, we hypothesize that in the absence of a classical cytosine DNA methylation pathway, histone modifications play an increasing role in maintaining mRNA dosage following gene duplication. We initially verified that reduced gene expression of duplicated genes had occurred in the worm, fly, and mouse (Mus musculus). Next, several histone marks, with the capacity to control mRNA abundance in the models studied, were examined. In the worm and fly, but not in the mouse, multiple histone modifications were found to assist mRNA dosage maintenance following gene duplication events and the possible involvement of adenine DNA methylation in this process was excluded. Furthermore, the histone marks and acting regions that mediated the reduction in duplicated gene expression were found to be largely organism specific. Thus, it appears that many of the histone marks that maintain mRNA dosage were independently recruited during the evolution of worms and flies to compensate for the loss of cytosine DNA methylation machinery from their genomes.
Collapse
Affiliation(s)
- Andrew Ying-Fei Chang
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County 350, Taiwan, Republic of China
| | - Ben-Yang Liao
- Institute of Population Health Sciences, National Health Research Institutes, Zhunan, Miaoli County 350, Taiwan, Republic of China
| |
Collapse
|
23
|
Rieckmann JC, Geiger R, Hornburg D, Wolf T, Kveler K, Jarrossay D, Sallusto F, Shen-Orr SS, Lanzavecchia A, Mann M, Meissner F. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat Immunol 2017; 18:583-593. [DOI: 10.1038/ni.3693] [Citation(s) in RCA: 229] [Impact Index Per Article: 32.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2016] [Accepted: 01/26/2017] [Indexed: 02/08/2023]
|
24
|
Chen JY, Pandey R, Nguyen TM. HAPPI-2: a Comprehensive and High-quality Map of Human Annotated and Predicted Protein Interactions. BMC Genomics 2017; 18:182. [PMID: 28212602 PMCID: PMC5314692 DOI: 10.1186/s12864-017-3512-1] [Citation(s) in RCA: 31] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2015] [Accepted: 01/24/2017] [Indexed: 01/07/2023] Open
Abstract
BACKGROUND Human protein-protein interaction (PPI) data is essential to network and systems biology studies. PPI data can help biochemists hypothesize how proteins form complexes by binding to each other, how extracellular signals propagate through post-translational modification of de-activated signaling molecules, and how chemical reactions are coupled by enzymes involved in a complex biological process. Our capability to develop good public database resources for human PPI data has a direct impact on the quality of future research on genome biology and medicine. RESULTS The database of Human Annotated and Predicted Protein Interactions (HAPPI) version 2.0 is a major update to the original HAPPI 1.0 database. It contains 2,922,202 unique protein-protein interactions (PPI) linked by 23,060 human proteins, making it the most comprehensive database covering human PPI data today. These PPIs contain both physical/direct interactions and high-quality functional/indirect interactions. Compared with the HAPPI 1.0 database release, HAPPI database version 2.0 (HAPPI-2) represents a 485% of human PPI data coverage increase and a 73% protein coverage increase. The revamped HAPPI web portal provides users with a friendly search, curation, and data retrieval interface, allowing them to retrieve human PPIs and available annotation information on the interaction type, interaction quality, interacting partner drug targeting data, and disease information. The updated HAPPI-2 can be freely accessed by Academic users at http://discovery.informatics.uab.edu/HAPPI . CONCLUSIONS While the underlying data for HAPPI-2 are integrated from a diverse data sources, the new HAPPI-2 release represents a good balance between data coverage and data quality of human PPIs, making it ideally suited for network biology.
Collapse
Affiliation(s)
- Jake Y Chen
- Wenzhou Medical University First Affiliate Hospital, Wenzhou, Zhejiang Province, China. .,Medeolinx, LLC, Indianapolis, IN, 46280, USA. .,The Informatics Institute, University of Alabama at Birmingham School of Medicine, Birmingham, AL, 35294, USA. .,Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA.
| | | | - Thanh M Nguyen
- Indiana Center for Systems Biology and Personalized Medicine, Indiana University School of Informatics and Computing, Indianapolis, IN, 46202, USA
| |
Collapse
|
25
|
Shui Y, Cho YR. Alignment of PPI Networks Using Semantic Similarity for Conserved Protein Complex Prediction. IEEE Trans Nanobioscience 2017; 15:380-389. [PMID: 28113907 DOI: 10.1109/tnb.2016.2555802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
Network alignment is a computational technique to identify topological similarity of graph data by mapping link patterns. In bioinformatics, network alignment algorithms have been applied to protein-protein interaction (PPI) networks to discover evolutionarily conserved substructures at the system level. In particular, local network alignment of PPI networks searches for conserved functional components between species and predicts unknown protein complexes and signaling pathways. In this article, we present a novel approach of local network alignment by semantic mapping. While most previous methods find protein matches between species by sequence homology, our approach uses semantic similarity. Given Gene Ontology (GO) and its annotation data, we estimate functional closeness between two proteins by measuring their semantic similarity. We adopted a new semantic similarity measure, simVICD, which has the best performance for PPI validation and functional match. We tested alignment between the PPI networks of well-studied yeast protein complexes and the genome-wide PPI network of human in order to predict human protein complexes. The experimental results demonstrate that our approach has higher accuracy in protein complex prediction than graph clustering algorithms, and higher efficiency than previous network alignment algorithms.
Collapse
|
26
|
Rudashevskaya EL, Sickmann A, Markoutsa S. Global profiling of protein complexes: current approaches and their perspective in biomedical research. Expert Rev Proteomics 2016; 13:951-964. [PMID: 27602509 DOI: 10.1080/14789450.2016.1233064] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
INTRODUCTION Despite the rapid evolution of proteomic methods, protein interactions and their participation in protein complexes - an important aspect of their function - has rarely been investigated on the proteome-wide level. Disease states, such as muscular dystrophy or viral infection, are induced by interference in protein-protein interactions within complexes. The purpose of this review is to describe the current methods for global complexome analysis and to critically discuss the challenges and opportunities for the application of these methods in biomedical research. Areas covered: We discuss advancements in experimental techniques and computational tools that facilitate profiling of the complexome. The main focus is on the separation of native protein complexes via size exclusion chromatography and gel electrophoresis, which has recently been combined with quantitative mass spectrometry, for a global protein-complex profiling. The development of this approach has been supported by advanced bioinformatics strategies and fast and sensitive mass spectrometers that have allowed the analysis of whole cell lysates. The application of this technique to biomedical research is assessed, and future directions are anticipated. Expert commentary: The methodology is quite new, and has already shown great potential when combined with complementary methods for detection of protein complexes.
Collapse
Affiliation(s)
- Elena L Rudashevskaya
- a Department of Bioanalytics , Leibniz-Institut für Analytische Wissenschaften - ISAS eV , Dortmund , Germany
| | - Albert Sickmann
- a Department of Bioanalytics , Leibniz-Institut für Analytische Wissenschaften - ISAS eV , Dortmund , Germany.,b Medizinisches Proteom-Center , Ruhr-Universität Bochum , Bochum , Germany.,c School of Natural & Computing Sciences, Department of Chemistry , University of Aberdeen , Aberdeen , UK
| | - Stavroula Markoutsa
- a Department of Bioanalytics , Leibniz-Institut für Analytische Wissenschaften - ISAS eV , Dortmund , Germany
| |
Collapse
|
27
|
Li M, Tang Y, Wu X, Wang J, Wu FX, Pan Y. C-DEVA: Detection, evaluation, visualization and annotation of clusters from biological networks. Biosystems 2016; 150:78-86. [PMID: 27530307 DOI: 10.1016/j.biosystems.2016.08.004] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2015] [Revised: 07/21/2016] [Accepted: 08/08/2016] [Indexed: 10/21/2022]
Abstract
With the progress of studies and researches on the biological networks, plenty of excellent clustering algorithms have been proposed. Nevertheless, not only different algorithms but also the same algorithms with different characteristics result in different performances on the same biological networks. Therefore, it might be difficult for researchers to choose an appropriate clustering algorithm to use for a specific network. Here we present C-DEVA, a comprehensive platform for Detecting clusters from biological networks and its Evaluation, Visualization and Annotation analysis. Ten clustering methods are provided in C-DEVA, covering different types of clustering algorithms, with a discrepancy in principle of each type. For the identified complexes, there are over ten popular and traditional bio-statistical measurements to assess them. And multi-source biological information has been integrated in C-DEVA, such as biology-functional annotations, and gold standard complex sets, which are collected from latest datasets in major databases or related papers. Furthermore, visualization analyses are available throughout the whole workflow, which endows C-DEVA with good usability and simple manipulation. To assure extensibility, development interfaces are offered in C-DEVA, for integrating new clustering as well as evaluating methods. Additionally, operations to the network as for example network randomization are also supported. C-DEVA provides a complete tool for identifying clusters from biological networks. Multiple options are offered during the analysis process, including detection methods, evaluation metrics and visualization modules. In addition, researchers could customize C-DEVA for the workflow according to the properties of their networks, and find the most ideal results. C-DEVA is released under the GNU General Public License (GPL), and the source code and binaries are freely available at https://github.com/cici333/c-deva.
Collapse
Affiliation(s)
- Min Li
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Yu Tang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Xuehong Wu
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, Changsha, 410083, China.
| | - Fang-Xiang Wu
- School of Information Science and Engineering, Central South University, Changsha, 410083, China; Division of Biomedical Engineering and Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, S7N 5A9, Canada.
| | - Yi Pan
- School of Information Science and Engineering, Central South University, Changsha, 410083, China; Department of Computer Science, Georgia State University, Atlanta, GA, 30302-4110, USA.
| |
Collapse
|
28
|
Predicting overlapping protein complexes from weighted protein interaction graphs by gradually expanding dense neighborhoods. Artif Intell Med 2016; 71:62-9. [DOI: 10.1016/j.artmed.2016.05.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2015] [Revised: 05/30/2016] [Accepted: 05/30/2016] [Indexed: 01/20/2023]
|
29
|
Chen B, Shang X, Li M, Wang J, Wu FX. Identifying Individual-Cancer-Related Genes by Rebalancing the Training Samples. IEEE Trans Nanobioscience 2016; 15:309-315. [PMID: 27093705 DOI: 10.1109/tnb.2016.2553119] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]
Abstract
The identification of individual-cancer-related genes typically is an imbalanced classification issue. The number of known cancer-related genes is far less than the number of all unknown genes, which makes it very hard to detect novel predictions from such imbalanced training samples. A regular machine learning method can either only detect genes related to all cancers or add clinical knowledge to circumvent this issue. In this study, we introduce a training sample rebalancing strategy to overcome this issue by using a two-step logistic regression and a random resampling method. The two-step logistic regression is to select a set of genes that related to all cancers. While the random resampling method is performed to further classify those genes associated with individual cancers. The issue of imbalanced classification is circumvented by randomly adding positive instances related to other cancers at first, and then excluding those unrelated predictions according to the overall performance at the following step. Numerical experiments show that the proposed resampling method is able to identify cancer-related genes even when the number of known genes related to it is small. The final predictions for all individual cancers achieve AUC values around 0.93 by using the leave-one-out cross validation method, which is very promising, compared with existing methods.
Collapse
|
30
|
Chen B, Li M, Wang J, Shang X, Wu FX. A fast and high performance multiple data integration algorithm for identifying human disease genes. BMC Med Genomics 2015; 8 Suppl 3:S2. [PMID: 26399620 PMCID: PMC4582601 DOI: 10.1186/1755-8794-8-s3-s2] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Integrating multiple data sources is indispensable in improving disease gene identification. It is not only due to the fact that disease genes associated with similar genetic diseases tend to lie close with each other in various biological networks, but also due to the fact that gene-disease associations are complex. Although various algorithms have been proposed to identify disease genes, their prediction performances and the computational time still should be further improved. RESULTS In this study, we propose a fast and high performance multiple data integration algorithm for identifying human disease genes. A posterior probability of each candidate gene associated with individual diseases is calculated by using a Bayesian analysis method and a binary logistic regression model. Two prior probability estimation strategies and two feature vector construction methods are developed to test the performance of the proposed algorithm. CONCLUSIONS The proposed algorithm is not only generated predictions with high AUC scores, but also runs very fast. When only a single PPI network is employed, the AUC score is 0.769 by using F2 as feature vectors. The average running time for each leave-one-out experiment is only around 1.5 seconds. When three biological networks are integrated, the AUC score using F3 as feature vectors increases to 0.830, and the average running time for each leave-one-out experiment takes only about 12.54 seconds. It is better than many existing algorithms.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi West Road, 710072, Xi'an, P.R. China
| | - Min Li
- School of Information Science and Engineering, Central South University, 410083, Changsha, P.R.China
| | - Jianxin Wang
- School of Information Science and Engineering, Central South University, 410083, Changsha, P.R.China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, 127 Youyi West Road, 710072, Xi'an, P.R. China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, 57 Campus Dr., S7N 5A9, Saskatoon, Canada
- Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Dr., S7N 5A9, Saskatoon, Canada
| |
Collapse
|
31
|
Bandyopadhyay S, Ray S, Mukhopadhyay A, Maulik U. A multiobjective approach for identifying protein complexes and studying their association in multiple disorders. Algorithms Mol Biol 2015; 10:24. [PMID: 26257820 PMCID: PMC4529733 DOI: 10.1186/s13015-015-0056-2] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Accepted: 07/28/2015] [Indexed: 11/17/2022] Open
Abstract
Background Detecting protein complexes within protein–protein interaction (PPI) networks is a major step toward the analysis of biological processes and pathways. Identification and characterization of protein complexes in PPI network is an ongoing challenge. Several high-throughput experimental techniques provide substantial number of PPIs which are widely utilized for compiling the PPI network of a species. Results Here we focus on detecting human protein complexes by developing a multiobjective framework. For this large human PPI network is partitioned into modules which serves as protein complex. For building the objective functions we have utilized topological properties of PPI network and biological properties based on Gene Ontology semantic similarity. The proposed method is compared with that of some state-of-the-art algorithms in the context of different performance metrics. For the purpose of biological validation of our predicted complexes we have also employed a Gene Ontology and pathway based analysis here. Additionally, we have performed an analysis to associate resulting protein complexes with 22 key disease classes. Two bipartite networks are created to clearly visualize the association of identified protein complexes with the disorder classes. Conclusions Here, we present the task of identifying protein complexes as a multiobjective optimization problem. Identified protein complexes are found to be associated with several disorders classes like ‘Cancer’, ‘Endocrine’ and ‘Multiple’. This analysis uncovers some new relationships between disorders and predicted complexes that may take a potential role in the prediction of multi target drugs. Electronic supplementary material The online version of this article (doi:10.1186/s13015-015-0056-2) contains supplementary material, which is available to authorized users.
Collapse
|
32
|
Li W, Freudenberg J, Oswald M. Principles for the organization of gene-sets. Comput Biol Chem 2015; 59 Pt B:139-49. [PMID: 26188561 DOI: 10.1016/j.compbiolchem.2015.04.005] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/24/2015] [Accepted: 04/08/2015] [Indexed: 12/23/2022]
Abstract
A gene-set, an important concept in microarray expression analysis and systems biology, is a collection of genes and/or their products (i.e. proteins) that have some features in common. There are many different ways to construct gene-sets, but a systematic organization of these ways is lacking. Gene-sets are mainly organized ad hoc in current public-domain databases, with group header names often determined by practical reasons (such as the types of technology in obtaining the gene-sets or a balanced number of gene-sets under a header). Here we aim at providing a gene-set organization principle according to the level at which genes are connected: homology, physical map proximity, chemical interaction, biological, and phenotypic-medical levels. We also distinguish two types of connections between genes: actual connection versus sharing of a label. Actual connections denote direct biological interactions, whereas shared label connection denotes shared membership in a group. Some extensions of the framework are also addressed such as overlapping of gene-sets, modules, and the incorporation of other non-protein-coding entities such as microRNAs.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA.
| | - Jan Freudenberg
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| | - Michaela Oswald
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| |
Collapse
|
33
|
Arisi I, D'Onofrio M, Brandi R, Cattaneo A, Bertolazzi P, Cumbo F, Felici G, Guerra C. Time dynamics of protein complexes in the AD11 transgenic mouse model for Alzheimer's disease like pathology. BMC Neurosci 2015; 16:28. [PMID: 25925689 PMCID: PMC4436769 DOI: 10.1186/s12868-015-0155-5] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2014] [Accepted: 03/11/2015] [Indexed: 11/20/2022] Open
Abstract
Background Many approaches exist to integrate protein-protein interaction data with other sources of information, most notably with gene co-expression data, to obtain information on network dynamics. It is of interest to look at groups of interacting gene products that form a protein complex. We were interested in applying new tools to the characterization of pathogenesis and dynamic events of an Alzheimer’s-like neurodegenerative model, the AD11 mice, expressing an anti-NGF monoclonal antibody. The goal was to quantify the impact of neurodegeneration on protein complexes, by measuring the correlation between gene expression data by different metrics. Results Data were extracted from the gene expression profile of AD11 brain, obtained by Agilent microarray, at 1, 3, 6, 15 months of age. For genes coding proteins in complexes, the correlation matrix of pairwise expression was computed. The dynamics between correlation matrices at different time points was evaluated: paired T-test between average correlation levels and a normalized Euclidean distance with z-score. We unveiled a differential wiring of interactions in a set of complexes, whose network structure discriminates between transgenic and control mice. Furthermore, we analyzed the dynamics of gene expression values, by looking at changes in gene-to-gene correlation over time and identified those complexes that exhibit a different timedependent behaviour between transgenic and controls. The most significant changes in correlation dynamics are concentrated in the early stage of disease, with higher correlation in AD11 mice compared to controls. Many complexes go through dynamic changes over time, showing the role of the dysfunctional immunoproteasome, as early neurodegenerative disease event. Furthermore, this analysis shows key events in the neurodegeneration process of the AD11 model, by identifying significant differences in co-expression values of other complexes, such as parvulin complex, with a role in protein misfolding and proteostasis, and of complexes involved in transcriptional mechanisms. Conclusions We have proposed a novel approach to analyze the network structure of protein complexes, by two different measures to evaluate the dynamics of gene-gene correlation matrices from gene expression profiles. The methodology was able to investigate the re-organization of interactions within protein complexes in the AD11 model of neurodegeneration.
Collapse
Affiliation(s)
- Ivan Arisi
- Genomics Facility, European Brain Research Institute (EBRI) Rita Levi-Montalcini, Via del Fosso di Fiorano, 64, 00143, Rome, Italy.
| | - Mara D'Onofrio
- Genomics Facility, European Brain Research Institute (EBRI) Rita Levi-Montalcini, Via del Fosso di Fiorano, 64, 00143, Rome, Italy.
| | - Rossella Brandi
- Genomics Facility, European Brain Research Institute (EBRI) Rita Levi-Montalcini, Via del Fosso di Fiorano, 64, 00143, Rome, Italy.
| | - Antonino Cattaneo
- Neurotrophic Factors and Neurodegenerative Diseases Unit, EBRI, Rome, Italy. .,Scuola Normale Superiore, Piazza dei Cavalieri 7, 56126, Pisa, Italy.
| | - Paola Bertolazzi
- Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" (IASI-CNR), Rome, Italy.
| | - Fabio Cumbo
- Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" (IASI-CNR), Rome, Italy.
| | - Giovanni Felici
- Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" (IASI-CNR), Rome, Italy.
| | - Concettina Guerra
- Istituto di Analisi dei Sistemi ed Informatica "Antonio Ruberti" (IASI-CNR), Rome, Italy. .,College of Computing, Georgia Institute of Technology, Atlanta, GA, USA.
| |
Collapse
|
34
|
Theofilatos K, Pavlopoulou N, Papasavvas C, Likothanassis S, Dimitrakopoulos C, Georgopoulos E, Moschopoulos C, Mavroudi S. Predicting protein complexes from weighted protein-protein interaction graphs with a novel unsupervised methodology: Evolutionary enhanced Markov clustering. Artif Intell Med 2015; 63:181-9. [PMID: 25765008 DOI: 10.1016/j.artmed.2014.12.012] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2013] [Revised: 12/23/2014] [Accepted: 12/26/2014] [Indexed: 12/13/2022]
Abstract
OBJECTIVE Proteins are considered to be the most important individual components of biological systems and they combine to form physical protein complexes which are responsible for certain molecular functions. Despite the large availability of protein-protein interaction (PPI) information, not much information is available about protein complexes. Experimental methods are limited in terms of time, efficiency, cost and performance constraints. Existing computational methods have provided encouraging preliminary results, but they phase certain disadvantages as they require parameter tuning, some of them cannot handle weighted PPI data and others do not allow a protein to participate in more than one protein complex. In the present paper, we propose a new fully unsupervised methodology for predicting protein complexes from weighted PPI graphs. METHODS AND MATERIALS The proposed methodology is called evolutionary enhanced Markov clustering (EE-MC) and it is a hybrid combination of an adaptive evolutionary algorithm and a state-of-the-art clustering algorithm named enhanced Markov clustering. EE-MC was compared with state-of-the-art methodologies when applied to datasets from the human and the yeast Saccharomyces cerevisiae organisms. RESULTS Using public available datasets, EE-MC outperformed existing methodologies (in some datasets the separation metric was increased by 10-20%). Moreover, when applied to new human datasets its performance was encouraging in the prediction of protein complexes which consist of proteins with high functional similarity. In specific, 5737 protein complexes were predicted and 72.58% of them are enriched for at least one gene ontology (GO) function term. CONCLUSIONS EE-MC is by design able to overcome intrinsic limitations of existing methodologies such as their inability to handle weighted PPI networks, their constraint to assign every protein in exactly one cluster and the difficulties they face concerning the parameter tuning. This fact was experimentally validated and moreover, new potentially true human protein complexes were suggested as candidates for further validation using experimental techniques.
Collapse
Affiliation(s)
- Konstantinos Theofilatos
- Department of Computer Engineering and Informatics, University of Patras, Building B University Campus Rio, Zip Code: 26500, Patras, Greece.
| | - Niki Pavlopoulou
- Department of Computer Engineering and Informatics, University of Patras, Building B University Campus Rio, Zip Code: 26500, Patras, Greece
| | - Christoforos Papasavvas
- Department of Computer Engineering and Informatics, University of Patras, Building B University Campus Rio, Zip Code: 26500, Patras, Greece
| | - Spiros Likothanassis
- Department of Computer Engineering and Informatics, University of Patras, Building B University Campus Rio, Zip Code: 26500, Patras, Greece
| | - Christos Dimitrakopoulos
- Department of Computer Engineering and Informatics, University of Patras, Building B University Campus Rio, Zip Code: 26500, Patras, Greece
| | - Efstratios Georgopoulos
- Department of Agricultural Technology, Technological Educational Institute of Kalamata, Antikalamos, Zip Code: 24100, Kalamata, Greece
| | - Charalampos Moschopoulos
- Department of Electrical Engineering, Katholieke Universiteit, Kasteelpark Arenberg 10 - box 2440, Zip Code: 3001, Leuven Belgium; iMinds Future Health Department, Katholieke Universiteit, Oude Markt 13 - bus 5005, Zip Code: 3000, Leuven, Belgium
| | - Seferina Mavroudi
- Department of Computer Engineering and Informatics, University of Patras, Building B University Campus Rio, Zip Code: 26500, Patras, Greece; Department of Social Work, School of Sciences of Health and Care, Technological Educational Institute of Patras, M. Alexandrou str. 1, Zip Code: 263 34, Patras, Greece.
| |
Collapse
|
35
|
Chen B, Li M, Wang J, Wu FX. Disease gene identification by using graph kernels and Markov random fields. SCIENCE CHINA. LIFE SCIENCES 2014; 57:1054-1063. [PMID: 25326067 DOI: 10.1007/s11427-014-4745-8] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/14/2014] [Indexed: 01/05/2023]
Abstract
Genes associated with similar diseases are often functionally related. This principle is largely supported by many biological data sources, such as disease phenotype similarities, protein complexes, protein-protein interactions, pathways and gene expression profiles. Integrating multiple types of biological data is an effective method to identify disease genes for many genetic diseases. To capture the gene-disease associations based on biological networks, a kernel-based MRF method is proposed by combining graph kernels and the Markov random field (MRF) method. In the proposed method, three kinds of kernels are employed to describe the overall relationships of vertices in five biological networks, respectively, and a novel weighted MRF method is developed to integrate those data. In addition, an improved Gibbs sampling procedure and a novel parameter estimation method are proposed to generate predictions from the kernel-based MRF method. Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. The proposed kernel-based MRF method is evaluated by the leave-one-out cross validation paradigm, achieving an AUC score of 0.771 when integrating all those biological data in our experiments, which indicates that our proposed method is very promising compared with many existing methods.
Collapse
Affiliation(s)
- BoLin Chen
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9, Canada
| | | | | | | |
Collapse
|
36
|
Chen B, Wang J, Li M, Wu FX. Identifying disease genes by integrating multiple data sources. BMC Med Genomics 2014; 7 Suppl 2:S2. [PMID: 25350511 PMCID: PMC4243092 DOI: 10.1186/1755-8794-7-s2-s2] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Now multiple types of data are available for identifying disease genes. Those data include gene-disease associations, disease phenotype similarities, protein-protein interactions, pathways, gene expression profiles, etc.. It is believed that integrating different kinds of biological data is an effective method to identify disease genes. RESULTS In this paper, we propose a multiple data integration method based on the theory of Markov random field (MRF) and the method of Bayesian analysis for identifying human disease genes. The proposed method is not only flexible in easily incorporating different kinds of data, but also reliable in predicting candidate disease genes. CONCLUSIONS Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. Predictions are evaluated by the leave-one-out method. The proposed method achieves an AUC score of 0.743 when integrating all those biological data in our experiments.
Collapse
|
37
|
Clancy T, Hovig E. From proteomes to complexomes in the era of systems biology. Proteomics 2014; 14:24-41. [PMID: 24243660 DOI: 10.1002/pmic.201300230] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2013] [Revised: 10/22/2013] [Accepted: 11/06/2013] [Indexed: 01/16/2023]
Abstract
Protein complexes carry out almost the entire signaling and functional processes in the cell. The protein complex complement of a cell, and its network of complex-complex interactions, is referred to here as the complexome. Computational methods to predict protein complexes from proteomics data, resulting in network representations of complexomes, have recently being developed. In addition, key advances have been made toward understanding the network and structural organization of complexomes. We review these bioinformatics advances, and their discovery-potential, as well as the merits of integrating proteomics data with emerging methods in systems biology to study protein complex signaling. It is envisioned that improved integration of proteomics and systems biology, incorporating the dynamics of protein complexes in space and time, may lead to more predictive models of cell signaling networks for effective modulation.
Collapse
Affiliation(s)
- Trevor Clancy
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
38
|
Chen Y, Jacquemin T, Zhang S, Jiang R. Prioritizing protein complexes implicated in human diseases by network optimization. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 1:S2. [PMID: 24565064 PMCID: PMC4080363 DOI: 10.1186/1752-0509-8-s1-s2] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Background The detection of associations between protein complexes and human inherited diseases is of great importance in understanding mechanisms of diseases. Dysfunctions of a protein complex are usually defined by its member disturbance and consequently result in certain diseases. Although individual disease proteins have been widely predicted, computational methods are still absent for systematically investigating disease-related protein complexes. Results We propose a method, MAXCOM, for the prioritization of candidate protein complexes. MAXCOM performs a maximum information flow algorithm to optimize relationships between a query disease and candidate protein complexes through a heterogeneous network that is constructed by combining protein-protein interactions and disease phenotypic similarities. Cross-validation experiments on 539 protein complexes show that MAXCOM can rank 382 (70.87%) protein complexes at the top against protein complexes constructed at random. Permutation experiments further confirm that MAXCOM is robust to the network structure and parameters involved. We further analyze protein complexes ranked among top ten for breast cancer and demonstrate that the SWI/SNF complex is potentially associated with breast cancer. Conclusions MAXCOM is an effective method for the discovery of disease-related protein complexes based on network optimization. The high performance and robustness of this approach can facilitate not only pathologic studies of diseases, but also the design of drugs targeting on multiple proteins.
Collapse
|
39
|
Berg EL. Systems biology in drug discovery and development. Drug Discov Today 2013; 19:113-25. [PMID: 24120892 DOI: 10.1016/j.drudis.2013.10.003] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2013] [Revised: 09/14/2013] [Accepted: 10/03/2013] [Indexed: 11/25/2022]
Abstract
The complexity of human biology makes it challenging to develop safe and effective new medicines. Systems biology omics-based efforts have led to an explosion of high-throughput data and focus is now shifting to the integration of diverse data types to connect molecular and pathway information to predict disease outcomes. Better models of human disease biology, including more integrated network-based models that can accommodate multiple omics data types, as well as more relevant experimental systems, will help predict drug effects in patients, enabling personalized medicine, improvement of the success rate of new drugs in the clinic, and the finding of new uses for existing drugs.
Collapse
Affiliation(s)
- Ellen L Berg
- BioSeek, A Division of DiscoveRx, 310 Utah Avenue, Suite 100, South San Francisco, CA 94080, USA.
| |
Collapse
|
40
|
Wang Y, Qian X. Functional module identification in protein interaction networks by interaction patterns. ACTA ACUST UNITED AC 2013; 30:81-93. [PMID: 24085567 DOI: 10.1093/bioinformatics/btt569] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
MOTIVATION Identifying functional modules in protein-protein interaction (PPI) networks may shed light on cellular functional organization and thereafter underlying cellular mechanisms. Many existing module identification algorithms aim to detect densely connected groups of proteins as potential modules. However, based on this simple topological criterion of 'higher than expected connectivity', those algorithms may miss biologically meaningful modules of functional significance, in which proteins have similar interaction patterns to other proteins in networks but may not be densely connected to each other. A few blockmodel module identification algorithms have been proposed to address the problem but the lack of global optimum guarantee and the prohibitive computational complexity have been the bottleneck of their applications in real-world large-scale PPI networks. RESULTS In this article, we propose a novel optimization formulation LCP(2) (low two-hop conductance sets) using the concept of Markov random walk on graphs, which enables simultaneous identification of both dense and sparse modules based on protein interaction patterns in given networks through searching for LCP(2) by random walk. A spectral approximate algorithm SLCP(2) is derived to identify non-overlapping functional modules. Based on a bottom-up greedy strategy, we further extend LCP(2) to a new algorithm (greedy algorithm for LCP(2)) GLCP(2) to identify overlapping functional modules. We compare SLCP(2) and GLCP(2) with a range of state-of-the-art algorithms on synthetic networks and real-world PPI networks. The performance evaluation based on several criteria with respect to protein complex prediction, high level Gene Ontology term prediction and especially sparse module detection, has demonstrated that our algorithms based on searching for LCP(2) outperform all other compared algorithms. AVAILABILITY AND IMPLEMENTATION All data and code are available at http://www.cse.usf.edu/~xqian/fmi/slcp2hop/.
Collapse
Affiliation(s)
- Yijie Wang
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33620, USA and Department of Electrical and Computer Engineering, Texas A&M University, College Station, TX 77843, USA
| | | |
Collapse
|
41
|
Takeda JI, Yamasaki C, Murakami K, Nagai Y, Sera M, Hara Y, Obi N, Habara T, Gojobori T, Imanishi T. H-InvDB in 2013: an omics study platform for human functional gene and transcript discovery. Nucleic Acids Res 2012. [PMID: 23197657 PMCID: PMC3531145 DOI: 10.1093/nar/gks1245] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
Abstract
H-InvDB (http://www.h-invitational.jp/) is a comprehensive human gene database started in 2004. In the latest version, H-InvDB 8.0, a total of 244 709 human complementary DNA was mapped onto the hg19 reference genome and 43 829 gene loci, including nonprotein-coding ones, were identified. Of these loci, 35 631 were identified as potential protein-coding genes, and 22 898 of these were identical to known genes. In our analysis, 19 309 annotated genes were specific to H-InvDB and not found in RefSeq and Ensembl. In fact, 233 genes of the 19 309 turned out to have protein functions in this version of H-InvDB; they were annotated as unknown protein functions in the previous version. Furthermore, 11 genes were identified as known Mendelian disorder genes. It is advantageous that many biologically functional genes are hidden in the H-InvDB unique genes. As large-scale proteomic projects have been conducted to elucidate the functions of all human proteins, we have enhanced the proteomic information with an advanced protein view and new subdatabase of protein complexes (Protein Complex Database with quality index). We propose that H-InvDB is an important resource for finding novel candidate targets for medical care and drug development.
Collapse
Affiliation(s)
- Jun-Ichi Takeda
- Integrated Database and Systems Biology Team, Biomedicinal Information Research Center, National Institute of Advanced Industrial Science and Technology, Aomi 2-4-7, Koto-ku, Tokyo 135-0064, Japan
| | | | | | | | | | | | | | | | | | | |
Collapse
|