451
|
Network Analysis of Human Disease Comorbidity Patterns Based on Large-Scale Data Mining. BIOINFORMATICS RESEARCH AND APPLICATIONS 2014. [DOI: 10.1007/978-3-319-08171-7_22] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
452
|
Wu L, Shen Y, Li M, Wu FX. Drug Target Identification Based on Structural Output Controllability of Complex Networks. ACTA ACUST UNITED AC 2014. [DOI: 10.1007/978-3-319-08171-7_17] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
453
|
FUNK CHRISTOPHERS, HUNTER LAWRENCEE, COHEN KBRETONNEL. Combining heterogenous data for prediction of disease related and pharmacogenes. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2014:328-39. [PMID: 24297559 PMCID: PMC3910248] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/02/2023]
Abstract
Identifying genetic variants that affect drug response or play a role in disease is an important task for clinicians and researchers. Before individual variants can be explored efficiently for effect on drug response or disease relationships, specific candidate genes must be identified. While many methods rank candidate genes through the use of sequence features and network topology, only a few exploit the information contained in the biomedical literature. In this work, we train and test a classifier on known pharmacogenes from PharmGKB and present a classifier that predicts pharmacogenes on a genome-wide scale using only Gene Ontology annotations and simple features mined from the biomedical literature. Performance of F=0.86, AUC=0.860 is achieved. The top 10 predicted genes are analyzed. Additionally, a set of enriched pharmacogenic Gene Ontology concepts is produced.
Collapse
Affiliation(s)
- CHRISTOPHER S. FUNK
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - LAWRENCE E. HUNTER
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| | - K. BRETONNEL COHEN
- Computational Bioscience Program, University of Colorado School of Medicine, Aurora, CO 80045, USA
| |
Collapse
|
454
|
Yu D, Kim M, Xiao G, Hwang TH. Review of biological network data and its applications. Genomics Inform 2013; 11:200-10. [PMID: 24465231 PMCID: PMC3897847 DOI: 10.5808/gi.2013.11.4.200] [Citation(s) in RCA: 65] [Impact Index Per Article: 5.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2013] [Revised: 11/20/2013] [Accepted: 11/21/2013] [Indexed: 12/16/2022] Open
Abstract
Studying biological networks, such as protein-protein interactions, is key to understanding complex biological activities. Various types of large-scale biological datasets have been collected and analyzed with high-throughput technologies, including DNA microarray, next-generation sequencing, and the two-hybrid screening system, for this purpose. In this review, we focus on network-based approaches that help in understanding biological systems and identifying biological functions. Accordingly, this paper covers two major topics in network biology: reconstruction of gene regulatory networks and network-based applications, including protein function prediction, disease gene prioritization, and network-based genome-wide association study.
Collapse
Affiliation(s)
- Donghyeon Yu
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Minsoo Kim
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Guanghua Xiao
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Tae Hyun Hwang
- Department of Clinical Sciences, Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| |
Collapse
|
455
|
Hou L, Chen M, Zhang CK, Cho J, Zhao H. Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum Mol Genet 2013; 23:2780-90. [PMID: 24381306 DOI: 10.1093/hmg/ddt668] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Although Genome Wide Association Studies (GWAS) have identified many susceptibility loci for common diseases, they only explain a small portion of heritability. It is challenging to identify the remaining disease loci because their association signals are likely weak and difficult to identify among millions of candidates. One potentially useful direction to increase statistical power is to incorporate functional genomics information, especially gene expression networks, to prioritize GWAS signals. Most current methods utilizing network information to prioritize disease genes are based on the 'guilt by association' principle, in which networks are treated as static, and disease-associated genes are assumed to locate closer with each other than random pairs in the network. In contrast, we propose a novel 'guilt by rewiring' principle. Studying the dynamics of gene networks between controls and patients, this principle assumes that disease genes more likely undergo rewiring in patients, whereas most of the network remains unaffected in disease condition. To demonstrate this principle, we consider the changes of co-expression networks in Crohn's disease patients and controls, and how network dynamics reveals information on disease associations. Our results demonstrate that network rewiring is abundant in the immune system, and disease-associated genes are more likely to be rewired in patients. To integrate this network rewiring feature and GWAS signals, we propose to use the Markov random field framework to integrate network information to prioritize genes. Applications in Crohn's disease and Parkinson's disease show that this framework leads to more replicable results, and implicates potentially disease-associated pathways.
Collapse
Affiliation(s)
- Lin Hou
- Department of Biostatistics, Yale School of Public Health, New Haven, CT 06510, USA
| | | | | | | | | |
Collapse
|
456
|
Walking on a tissue-specific disease-protein-complex heterogeneous network for the discovery of disease-related protein complexes. BIOMED RESEARCH INTERNATIONAL 2013; 2013:732650. [PMID: 24455720 PMCID: PMC3888695 DOI: 10.1155/2013/732650] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/11/2013] [Accepted: 10/07/2013] [Indexed: 11/29/2022]
Abstract
Besides the pinpointing of individual disease-related genes, associating protein complexes to human inherited diseases is also of great importance, because a biological function usually arises from the cooperative behaviour of multiple proteins in a protein complex. Moreover, knowledge about disease-related protein complexes could also enhance the inference of disease genes and pathogenic genetic variants. Here, we have designed a computational systems biology approach to systematically analyse potential relationships between diseases and protein complexes. First, we construct a heterogeneous network which is composed of a disease-disease similarity layer, a tissue-specific protein-protein interaction layer, and a protein complex membership layer. Then, we propose a random walk model on this disease-protein-complex network for identifying protein complexes that are related to a query disease. With a series of leave-one-out cross-validation experiments, we show that our method not only possesses high performance but also demonstrates robustness regarding the parameters and the network structure. We further predict a landscape of associations between human diseases and protein complexes. This landscape can be used to facilitate the inference of disease genes, thereby benefiting studies on pathology of diseases.
Collapse
|
457
|
Chen Y, Wu X, Jiang R. Integrating human omics data to prioritize candidate genes. BMC Med Genomics 2013; 6:57. [PMID: 24344781 PMCID: PMC3878333 DOI: 10.1186/1755-8794-6-57] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2013] [Accepted: 12/12/2013] [Indexed: 01/07/2023] Open
Abstract
Background The identification of genes involved in human complex diseases remains a great challenge in computational systems biology. Although methods have been developed to use disease phenotypic similarities with a protein-protein interaction network for the prioritization of candidate genes, other valuable omics data sources have been largely overlooked in these methods. Methods With this understanding, we proposed a method called BRIDGE to prioritize candidate genes by integrating disease phenotypic similarities with such omics data as protein-protein interactions, gene sequence similarities, gene expression patterns, gene ontology annotations, and gene pathway memberships. BRIDGE utilizes a multiple regression model with lasso penalty to automatically weight different data sources and is capable of discovering genes associated with diseases whose genetic bases are completely unknown. Results We conducted large-scale cross-validation experiments and demonstrated that more than 60% known disease genes can be ranked top one by BRIDGE in simulated linkage intervals, suggesting the superior performance of this method. We further performed two comprehensive case studies by applying BRIDGE to predict novel genes and transcriptional networks involved in obesity and type II diabetes. Conclusion The proposed method provides an effective and scalable way for integrating multi omics data to infer disease genes. Further applications of BRIDGE will be benefit to providing novel disease genes and underlying mechanisms of human diseases.
Collapse
Affiliation(s)
| | | | - Rui Jiang
- Department of Automation, MOE Key Laboratory of Bioinformatics; Bioinformatics Division and Center for Synthetic & Systems Biology, TNLIST, Tsinghua University, Beijing 100084, China.
| |
Collapse
|
458
|
Leiserson MDM, Eldridge JV, Ramachandran S, Raphael BJ. Network analysis of GWAS data. Curr Opin Genet Dev 2013; 23:602-10. [PMID: 24287332 PMCID: PMC3867794 DOI: 10.1016/j.gde.2013.09.003] [Citation(s) in RCA: 75] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2013] [Revised: 09/19/2013] [Accepted: 09/23/2013] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies (GWAS) identify genetic variants that distinguish a control population from a population with a specific trait. Two challenges in GWAS are: (1) identification of the causal variant within a longer haplotype that is associated with the trait; (2) identification of causal variants for polygenic traits that are caused by variants in multiple genes within a pathway. We review recent methods that use information in protein-protein and protein-DNA interaction networks to address these two challenges.
Collapse
Affiliation(s)
- Mark D M Leiserson
- Department of Computer Science, Brown University, Providence, RI 02912, United States; Center for Computational Molecular Biology, Brown University, Providence, RI 02912, United States
| | | | | | | |
Collapse
|
459
|
Poirel CL, Rodrigues RR, Chen KC, Tyson JJ, Murali TM. Top-down network analysis to drive bottom-up modeling of physiological processes. J Comput Biol 2013; 20:409-18. [PMID: 23641868 DOI: 10.1089/cmb.2012.0274] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Top-down analyses in systems biology can automatically find correlations among genes and proteins in large-scale datasets. However, it is often difficult to design experiments from these results. In contrast, bottom-up approaches painstakingly craft detailed models that can be simulated computationally to suggest wet lab experiments. However, developing the models is a manual process that can take many years. These approaches have largely been developed independently. We present LINKER, an efficient and automated data-driven method that can analyze molecular interactomes to propose extensions to models that can be simulated. LINKER combines teleporting random walks and k-shortest path computations to discover connections from a source protein to a set of proteins collectively involved in a particular cellular process. We evaluate the efficacy of LINKER by applying it to a well-known dynamic model of the cell division cycle in Saccharomyces cerevisiae. Compared to other state-of-the-art methods, subnetworks computed by LINKER are heavily enriched in Gene Ontology (GO) terms relevant to the cell cycle. Finally, we highlight how networks computed by LINKER elucidate the role of a protein kinase (Cdc5) in the mitotic exit network of a dynamic model of the cell cycle.
Collapse
|
460
|
Atias N, Istrail S, Sharan R. Pathway-based analysis of genomic variation data. Curr Opin Genet Dev 2013; 23:622-6. [PMID: 24209906 DOI: 10.1016/j.gde.2013.09.002] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2013] [Revised: 09/18/2013] [Accepted: 09/18/2013] [Indexed: 02/02/2023]
Abstract
A holy grail of genetics is to decipher the mapping from genotype to phenotype. Recent advances in sequencing technologies allow the efficient genotyping of thousands of individuals carrying a particular phenotype in an effort to reveal its genetic determinants. However, the interpretation of these data entails tackling significant statistical and computational problems that stem from the complexity of human phenotypes and the huge genotypic search space. Recently, an alternative pathway-level analysis has been employed to combat these problems. In this review we discuss these developments, describe the challenges involved and outline possible solutions and future directions for improvement.
Collapse
Affiliation(s)
- Nir Atias
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv 69978, Israel
| | | | | |
Collapse
|
461
|
Nguyen PV, Srihari S, Leong HW. Identifying conserved protein complexes between species by constructing interolog networks. BMC Bioinformatics 2013; 14 Suppl 16:S8. [PMID: 24564762 PMCID: PMC4098725 DOI: 10.1186/1471-2105-14-s16-s8] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
Abstract
BACKGROUND Protein complexes conserved across species indicate processes that are core to cellular machinery (e.g. cell-cycle or DNA damage-repair complexes conserved across human and yeast). While numerous computational methods have been devised to identify complexes from the protein interaction (PPI) networks of individual species, these are severely limited by noise and errors (false positives) in currently available datasets. Our analysis using human and yeast PPI networks revealed that these methods missed several important complexes including those conserved between the two species (e.g. the MLH1-MSH2-PMS2-PCNA mismatch-repair complex). Here, we note that much of the functionalities of yeast complexes have been conserved in human complexes not only through sequence conservation of proteins but also of critical functional domains. Therefore, integrating information of domain conservation might throw further light on conservation patterns between yeast and human complexes. RESULTS We identify conserved complexes by constructing an interolog network (IN) leveraging on the functional conservation of proteins between species through domain conservation (from Ensembl) in addition to sequence similarity. We employ 'state-of-the-art' methods to cluster the interolog network, and map these clusters back to the original PPI networks to identify complexes conserved between the species. Evaluation of our IN-based approach (called COCIN) on human and yeast interaction data identifies several additional complexes (76% recall) compared to direct complex detection from the original PINs (54% recall). Our analysis revealed that the IN-construction removes several non-conserved interactions many of which are false positives, thereby improving complex prediction. In fact removing non-conserved interactions from the original PINs also resulted in higher number of conserved complexes, thereby validating our IN-based approach. These complexes included the mismatch repair complex, MLH1-MSH2-PMS2-PCNA, and other important ones namely, RNA polymerase-II, EIF3 and MCM complexes, all of which constitute core cellular processes known to be conserved across the two species. CONCLUSIONS Our method based on integrating domain conservation and sequence similarity to construct interolog networks helps to identify considerably more conserved complexes between the PPI networks from two species compared to direct complex prediction from the PPI networks. We observe from our experiments that protein complexes are not conserved from yeast to human in a straightforward way, that is, it is not the case that a yeast complex is a (proper) sub-set of a human complex with a few additional proteins present in the human complex. Instead complexes have evolved multifold with considerable re-organization of proteins and re-distribution of their functions across complexes. This finding can have significant implications on attempts to extrapolate other kinds of relationships such as synthetic lethality from yeast to human, for example in the identification of novel cancer targets. AVAILABILITY http://www.comp.nus.edu.sg/~leonghw/COCIN/.
Collapse
Affiliation(s)
- Phi-Vu Nguyen
- Department of Computer Science, National University of Singapore, Singapore
117590
| | - Sriganesh Srihari
- Institute for Molecular Bioscience, The University of Queensland, St. Lucia, QLD
4072, Australia
| | - Hon Wai Leong
- Department of Computer Science, National University of Singapore, Singapore
117590
| |
Collapse
|
462
|
Advanced systems biology methods in drug discovery and translational biomedicine. BIOMED RESEARCH INTERNATIONAL 2013; 2013:742835. [PMID: 24171171 PMCID: PMC3792523 DOI: 10.1155/2013/742835] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Subscribe] [Scholar Register] [Received: 06/17/2013] [Accepted: 08/26/2013] [Indexed: 02/08/2023]
Abstract
Systems biology is in an exponential development stage in recent years and has been widely utilized in biomedicine to better understand the molecular basis of human disease and the mechanism of drug action. Here, we discuss the fundamental concept of systems biology and its two computational methods that have been commonly used, that is, network analysis and dynamical modeling. The applications of systems biology in elucidating human disease are highlighted, consisting of human disease networks, treatment response prediction, investigation of disease mechanisms, and disease-associated gene prediction. In addition, important advances in drug discovery, to which systems biology makes significant contributions, are discussed, including drug-target networks, prediction of drug-target interactions, investigation of drug adverse effects, drug repositioning, and drug combination prediction. The systems biology methods and applications covered in this review provide a framework for addressing disease mechanism and approaching drug discovery, which will facilitate the translation of research findings into clinical benefits such as novel biomarkers and promising therapies.
Collapse
|
463
|
Abstract
This paper reports a strategy for combining somatic mutation profiles of human tumors with gene networks to stratify tumors into biologically and clinically relevant subtypes. The method is applied to ovarian, uterine and lung cancers. Many forms of cancer have multiple subtypes with different causes and clinical outcomes. Somatic tumor genome sequences provide a rich new source of data for uncovering these subtypes but have proven difficult to compare, as two tumors rarely share the same mutations. Here we introduce network-based stratification (NBS), a method to integrate somatic tumor genomes with gene networks. This approach allows for stratification of cancer into informative subtypes by clustering together patients with mutations in similar network regions. We demonstrate NBS in ovarian, uterine and lung cancer cohorts from The Cancer Genome Atlas. For each tissue, NBS identifies subtypes that are predictive of clinical outcomes such as patient survival, response to therapy or tumor histology. We identify network regions characteristic of each subtype and show how mutation-derived subtypes can be used to train an mRNA expression signature, which provides similar information in the absence of DNA sequence.
Collapse
|
464
|
Chen X, Yan GY. Novel human lncRNA-disease association inference based on lncRNA expression profiles. ACTA ACUST UNITED AC 2013; 29:2617-24. [PMID: 24002109 DOI: 10.1093/bioinformatics/btt426] [Citation(s) in RCA: 429] [Impact Index Per Article: 39.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/12/2023]
Abstract
MOTIVATION More and more evidences have indicated that long-non-coding RNAs (lncRNAs) play critical roles in many important biological processes. Therefore, mutations and dysregulations of these lncRNAs would contribute to the development of various complex diseases. Developing powerful computational models for potential disease-related lncRNAs identification would benefit biomarker identification and drug discovery for human disease diagnosis, treatment, prognosis and prevention. RESULTS In this article, we proposed the assumption that similar diseases tend to be associated with functionally similar lncRNAs. Then, we further developed the method of Laplacian Regularized Least Squares for LncRNA-Disease Association (LRLSLDA) in the semisupervised learning framework. Although known disease-lncRNA associations in the database are rare, LRLSLDA still obtained an AUC of 0.7760 in the leave-one-out cross validation, significantly improving the performance of previous methods. We also illustrated the performance of LRLSLDA is not sensitive (even robust) to the parameters selection and it can obtain a reliable performance in all the test classes. Plenty of potential disease-lncRNA associations were publicly released and some of them have been confirmed by recent results in biological experiments. It is anticipated that LRLSLDA could be an effective and important biological tool for biomedical research. AVAILABILITY The code of LRLSLDA is freely available at http://asdcd.amss.ac.cn/Software/Details/2.
Collapse
Affiliation(s)
- Xing Chen
- National Center for Mathematics and Interdisciplinary Sciences and Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China
| | | |
Collapse
|
465
|
Cho DY, Przytycka TM. Dissecting cancer heterogeneity with a probabilistic genotype-phenotype model. Nucleic Acids Res 2013; 41:8011-20. [PMID: 23821670 PMCID: PMC3783162 DOI: 10.1093/nar/gkt577] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2013] [Revised: 06/05/2013] [Accepted: 06/07/2013] [Indexed: 12/13/2022] Open
Abstract
One of the obstacles hindering a better understanding of cancer is its heterogeneity. However, computational approaches to model cancer heterogeneity have lagged behind. To bridge this gap, we have developed a new probabilistic approach that models individual cancer cases as mixtures of subtypes. Our approach can be seen as a meta-model that summarizes the results of a large number of alternative models. It does not assume predefined subtypes nor does it assume that such subtypes have to be sharply defined. Instead given a measure of phenotypic similarity between patients and a list of potential explanatory features, such as mutations, copy number variation, microRNA levels, etc., it explains phenotypic similarities with the help of these features. We applied our approach to Glioblastoma Multiforme (GBM). The resulting model Prob_GBM, not only correctly inferred known relationships but also identified new properties underlining phenotypic similarities. The proposed probabilistic framework can be applied to model relations between similarity of gene expression and a broad spectrum of potential genetic causes.
Collapse
Affiliation(s)
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA
| |
Collapse
|
466
|
Li W, Chen L, He W, Li W, Qu X, Liang B, Gao Q, Feng C, Jia X, Lv Y, Zhang S, Li X. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on "guilt by association" analysis. PLoS One 2013; 8:e71191. [PMID: 23940716 PMCID: PMC3733802 DOI: 10.1371/journal.pone.0071191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/28/2013] [Indexed: 01/12/2023] Open
Abstract
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
| | - Weiguo Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoli Qu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Binhua Liang
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Qianping Gao
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenchen Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xu Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yana Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Siya Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| |
Collapse
|
467
|
Tuncbag N, Braunstein A, Pagnani A, Huang SSC, Chayes J, Borgs C, Zecchina R, Fraenkel E. Simultaneous reconstruction of multiple signaling pathways via the prize-collecting steiner forest problem. J Comput Biol 2013; 20:124-36. [PMID: 23383998 DOI: 10.1089/cmb.2012.0092] [Citation(s) in RCA: 90] [Impact Index Per Article: 8.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022] Open
Abstract
Signaling and regulatory networks are essential for cells to control processes such as growth, differentiation, and response to stimuli. Although many "omic" data sources are available to probe signaling pathways, these data are typically sparse and noisy. Thus, it has been difficult to use these data to discover the cause of the diseases and to propose new therapeutic strategies. We overcome these problems and use "omic" data to reconstruct simultaneously multiple pathways that are altered in a particular condition by solving the prize-collecting Steiner forest problem. To evaluate this approach, we use the well-characterized yeast pheromone response. We then apply the method to human glioblastoma data, searching for a forest of trees, each of which is rooted in a different cell-surface receptor. This approach discovers both overlapping and independent signaling pathways that are enriched in functionally and clinically relevant proteins, which could provide the basis for new therapeutic strategies. Although the algorithm was not provided with any information about the phosphorylation status of receptors, it identifies a small set of clinically relevant receptors among hundreds present in the interactome.
Collapse
Affiliation(s)
- Nurcan Tuncbag
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, USA
| | | | | | | | | | | | | | | |
Collapse
|
468
|
Yates CM, Sternberg MJE. The effects of non-synonymous single nucleotide polymorphisms (nsSNPs) on protein-protein interactions. J Mol Biol 2013; 425:3949-63. [PMID: 23867278 DOI: 10.1016/j.jmb.2013.07.012] [Citation(s) in RCA: 123] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2013] [Revised: 07/02/2013] [Accepted: 07/09/2013] [Indexed: 12/23/2022]
Abstract
Non-synonymous single nucleotide polymorphisms (nsSNPs) are single base changes leading to a change to the amino acid sequence of the encoded protein. Many of these variants are associated with disease, so nsSNPs have been well studied, with studies looking at the effects of nsSNPs on individual proteins, for example, on stability and enzyme active sites. In recent years, the impact of nsSNPs upon protein-protein interactions has also been investigated, giving a greater insight into the mechanisms by which nsSNPs can lead to disease. In this review, we summarize these studies, looking at the various mechanisms by which nsSNPs can affect protein-protein interactions. We focus on structural changes that can impair interaction, changes to disorder, gain of interaction, and post-translational modifications before looking at some examples of nsSNPs at human-pathogen protein-protein interfaces and the analysis of nsSNPs from a network perspective.
Collapse
Affiliation(s)
- Christopher M Yates
- Centre for Integrative Systems Biology and Bioinformatics, Department of Life Sciences, Sir Ernst Chain Building, Imperial College London, South Kensington, SW7 2AZ, UK.
| | | |
Collapse
|
469
|
Xu R, Li L, Wang Q. Towards building a disease-phenotype knowledge base: extracting disease-manifestation relationship from literature. Bioinformatics 2013; 29:2186-94. [PMID: 23828786 DOI: 10.1093/bioinformatics/btt359] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023] Open
Abstract
MOTIVATION Systems approaches to studying phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repurposing. Currently, systematic study of disease phenotypic relationships on a phenome-wide scale is limited because large-scale machine-understandable disease-phenotype relationship knowledge bases are often unavailable. Here, we present an automatic approach to extract disease-manifestation (D-M) pairs (one specific type of disease-phenotype relationship) from the wide body of published biomedical literature. DATA AND METHODS Our method leverages external knowledge and limits the amount of human effort required. For the text corpus, we used 119 085 682 MEDLINE sentences (21 354 075 citations). First, we used D-M pairs from existing biomedical ontologies as prior knowledge to automatically discover D-M-specific syntactic patterns. We then extracted additional pairs from MEDLINE using the learned patterns. Finally, we analysed correlations between disease manifestations and disease-associated genes and drugs to demonstrate the potential of this newly created knowledge base in disease gene discovery and drug repurposing. RESULTS In total, we extracted 121 359 unique D-M pairs with a high precision of 0.924. Among the extracted pairs, 120 419 (99.2%) have not been captured in existing structured knowledge sources. We have shown that disease manifestations correlate positively with both disease-associated genes and drug treatments. CONCLUSIONS The main contribution of our study is the creation of a large-scale and accurate D-M phenotype relationship knowledge base. This unique knowledge base, when combined with existing phenotypic, genetic and proteomic datasets, can have profound implications in our deeper understanding of disease etiology and in rapid drug repurposing. AVAILABILITY http://nlp.case.edu/public/data/DMPatternUMLS/
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Program, Center for Clinical Investigation, Case Western Reserve University, Cleveland, OH 44106, USA.
| | | | | |
Collapse
|
470
|
Hwang TH, Atluri G, Kuang R, Kumar V, Starr T, Silverstein KAT, Haverty PM, Zhang Z, Liu J. Large-scale integrative network-based analysis identifies common pathways disrupted by copy number alterations across cancers. BMC Genomics 2013; 14:440. [PMID: 23822816 PMCID: PMC3703268 DOI: 10.1186/1471-2164-14-440] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2012] [Accepted: 06/26/2013] [Indexed: 12/21/2022] Open
Abstract
BACKGROUND Many large-scale studies analyzed high-throughput genomic data to identify altered pathways essential to the development and progression of specific types of cancer. However, no previous study has been extended to provide a comprehensive analysis of pathways disrupted by copy number alterations across different human cancers. Towards this goal, we propose a network-based method to integrate copy number alteration data with human protein-protein interaction networks and pathway databases to identify pathways that are commonly disrupted in many different types of cancer. RESULTS We applied our approach to a data set of 2,172 cancer patients across 16 different types of cancers, and discovered a set of commonly disrupted pathways, which are likely essential for tumor formation in majority of the cancers. We also identified pathways that are only disrupted in specific cancer types, providing molecular markers for different human cancers. Analysis with independent microarray gene expression datasets confirms that the commonly disrupted pathways can be used to identify patient subgroups with significantly different survival outcomes. We also provide a network view of disrupted pathways to explain how copy number alterations affect pathways that regulate cell growth, cycle, and differentiation for tumorigenesis. CONCLUSIONS In this work, we demonstrated that the network-based integrative analysis can help to identify pathways disrupted by copy number alterations across 16 types of human cancers, which are not readily identifiable by conventional overrepresentation-based and other pathway-based methods. All the results and source code are available at http://compbio.cs.umn.edu/NetPathID/.
Collapse
Affiliation(s)
- Tae Hyun Hwang
- Masonic Cancer Center, University of Minnesota – Twin Cities, Minneapolis, MN, USA
- Quantitative Biomedical Research Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Clinical Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Simmons Cancer Center, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Gowtham Atluri
- Department of Computer Science and Engineering, University of Minnesota – Twin Cities, Minneapolis, MN, USA
| | - Rui Kuang
- Department of Computer Science and Engineering, University of Minnesota – Twin Cities, Minneapolis, MN, USA
| | - Vipin Kumar
- Department of Computer Science and Engineering, University of Minnesota – Twin Cities, Minneapolis, MN, USA
| | - Timothy Starr
- Masonic Cancer Center, University of Minnesota – Twin Cities, Minneapolis, MN, USA
- Department of Obstetrics, Gynecology & Women’s Health, University of Minnesota, Minneapolis, MN, USA
| | - Kevin AT Silverstein
- Masonic Cancer Center, University of Minnesota – Twin Cities, Minneapolis, MN, USA
| | - Peter M Haverty
- Department of Bioinformatics and Computational Biology, Genentech Inc, South San Francisco, CA, USA
| | - Zemin Zhang
- Department of Bioinformatics and Computational Biology, Genentech Inc, South San Francisco, CA, USA
| | - Jinfeng Liu
- Department of Bioinformatics and Computational Biology, Genentech Inc, South San Francisco, CA, USA
| |
Collapse
|
471
|
Nie Y, Yu J. Mining breast cancer genes with a network based noise-tolerant approach. BMC SYSTEMS BIOLOGY 2013; 7:49. [PMID: 23799982 PMCID: PMC3702465 DOI: 10.1186/1752-0509-7-49] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/28/2012] [Accepted: 06/21/2013] [Indexed: 12/22/2022]
Abstract
BACKGROUND Mining novel breast cancer genes is an important task in breast cancer research. Many approaches prioritize candidate genes based on their similarity to known cancer genes, usually by integrating multiple data sources. However, different types of data often contain varying degrees of noise. For effective data integration, it's important to design methods that work robustly with respect to noise. RESULTS Gene Ontology (GO) annotations were often utilized in cancer gene mining works. However, the vast majority of GO annotations were computationally derived, thus not completely accurate. A set of genes annotated with breast cancer enriched GO terms was adopted here as a set of source data with realistic noise. A novel noise tolerant approach was proposed to rank candidate breast cancer genes using noisy source data within the framework of a comprehensive human Protein-Protein Interaction (PPI) network. Performance of the proposed method was quantitatively evaluated by comparing it with the more established random walk approach. Results showed that the proposed method exhibited better performance in ranking known breast cancer genes and higher robustness against data noise than the random walk approach. When noise started to increase, the proposed method was able to maintained relatively stable performance, while the random walk approach showed drastic performance decline; when noise increased to a large extent, the proposed method was still able to achieve better performance than random walk did. CONCLUSIONS A novel noise tolerant method was proposed to mine breast cancer genes. Compared to the well established random walk approach, it showed better performance in correctly ranking cancer genes and worked robustly with respect to noise within source data. To the best of our knowledge, it's the first such effort to quantitatively analyze noise tolerance between different breast cancer gene mining methods. The sorted gene list can be valuable for breast cancer research. The proposed quantitative noise analysis method may also prove useful for other data integration efforts. It is hoped that the current work can lead to more discussions about influence of data noise on different computational methods for mining disease genes.
Collapse
Affiliation(s)
- Yaling Nie
- National Key Laboratory of Biochemical Engineering, Institute of Process Engineering, Chinese Academy of Sciences, Beijing 100190, China
| | | |
Collapse
|
472
|
Chen G, Chen J, Shi C, Shi L, Tong W, Shi T. Dissecting the Characteristics and Dynamics of Human Protein Complexes at Transcriptome Cascade Using RNA-Seq Data. PLoS One 2013; 8:e66521. [PMID: 23824284 PMCID: PMC3688907 DOI: 10.1371/journal.pone.0066521] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2013] [Accepted: 05/06/2013] [Indexed: 11/19/2022] Open
Abstract
Human protein complexes play crucial roles in various biological processes as the functional module. However, the expression features of human protein complexes at the transcriptome cascade are poorly understood. Here, we used the RNA-Seq data from 16 disparate tissues and four types of human cancers to explore the characteristics and dynamics of human protein complexes. We observed that many individual components of human protein complexes can be generated by multiple distinct transcripts. Similar with yeast, the human protein complex constituents are inclined to co-express in diverse tissues. The dominant isoform of the genes involved in protein complexes tend to encode the complex constituents in each tissue. Our results indicate that the protein complex dynamics not only correlate with the presence or absence of complexes, but may also be related to the major isoform switching for complex subunits. Between any two cancers of breast, colon, lung and prostate, we found that only a few of the differentially expressed transcripts associated with complexes were identical, but 5-10 times more protein complexes involved in differentially expressed transcripts were common. Collectively, our study reveals novel properties and dynamics of human protein complexes at the transcriptome cascade in diverse normal tissues and different cancers.
Collapse
Affiliation(s)
- Geng Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Jiwei Chen
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Caiping Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
| | - Leming Shi
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Weida Tong
- National Center for Toxicological Research, US Food and Drug Administration, Jefferson, Arkansas, United States of America
| | - Tieliu Shi
- The Center for Bioinformatics and Computational Biology, Shanghai Key Laboratory of Regulatory Biology, the Institute of Biomedical Sciences and School of Life Sciences, East China Normal University, Shanghai, China
- * E-mail:
| |
Collapse
|
473
|
Abstract
High-throughput experimental technologies are generating increasingly massive and complex genomic data sets. The sheer enormity and heterogeneity of these data threaten to make the arising problems computationally infeasible. Fortunately, powerful algorithmic techniques lead to software that can answer important biomedical questions in practice. In this Review, we sample the algorithmic landscape, focusing on state-of-the-art techniques, the understanding of which will aid the bench biologist in analysing omics data. We spotlight specific examples that have facilitated and enriched analyses of sequence, transcriptomic and network data sets.
Collapse
Affiliation(s)
- Bonnie Berger
- Department of Mathematics and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA.
| | | | | |
Collapse
|
474
|
Csermely P, Korcsmáros T, Kiss HJM, London G, Nussinov R. Structure and dynamics of molecular networks: a novel paradigm of drug discovery: a comprehensive review. Pharmacol Ther 2013; 138:333-408. [PMID: 23384594 PMCID: PMC3647006 DOI: 10.1016/j.pharmthera.2013.01.016] [Citation(s) in RCA: 506] [Impact Index Per Article: 46.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/22/2013] [Accepted: 01/22/2013] [Indexed: 02/02/2023]
Abstract
Despite considerable progress in genome- and proteome-based high-throughput screening methods and in rational drug design, the increase in approved drugs in the past decade did not match the increase of drug development costs. Network description and analysis not only give a systems-level understanding of drug action and disease complexity, but can also help to improve the efficiency of drug design. We give a comprehensive assessment of the analytical tools of network topology and dynamics. The state-of-the-art use of chemical similarity, protein structure, protein-protein interaction, signaling, genetic interaction and metabolic networks in the discovery of drug targets is summarized. We propose that network targeting follows two basic strategies. The "central hit strategy" selectively targets central nodes/edges of the flexible networks of infectious agents or cancer cells to kill them. The "network influence strategy" works against other diseases, where an efficient reconfiguration of rigid networks needs to be achieved by targeting the neighbors of central nodes/edges. It is shown how network techniques can help in the identification of single-target, edgetic, multi-target and allo-network drug target candidates. We review the recent boom in network methods helping hit identification, lead selection optimizing drug efficacy, as well as minimizing side-effects and drug toxicity. Successful network-based drug development strategies are shown through the examples of infections, cancer, metabolic diseases, neurodegenerative diseases and aging. Summarizing >1200 references we suggest an optimized protocol of network-aided drug development, and provide a list of systems-level hallmarks of drug quality. Finally, we highlight network-related drug development trends helping to achieve these hallmarks by a cohesive, global approach.
Collapse
Affiliation(s)
- Peter Csermely
- Department of Medical Chemistry, Semmelweis University, P.O. Box 260, H-1444 Budapest 8, Hungary.
| | | | | | | | | |
Collapse
|
475
|
Kim YA, Przytycka TM. Bridging the Gap between Genotype and Phenotype via Network Approaches. Front Genet 2013; 3:227. [PMID: 23755063 PMCID: PMC3668153 DOI: 10.3389/fgene.2012.00227] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/19/2012] [Accepted: 10/10/2012] [Indexed: 11/15/2022] Open
Abstract
In the last few years we have witnessed tremendous progress in detecting associations between genetic variations and complex traits. While genome-wide association studies have been able to discover genomic regions that may influence many common human diseases, these discoveries created an urgent need for methods that extend the knowledge of genotype-phenotype relationships to the level of the molecular mechanisms behind them. To address this emerging need, computational approaches increasingly utilize a pathway-centric perspective. These new methods often utilize known or predicted interactions between genes and/or gene products. In this review, we survey recently developed network based methods that attempt to bridge the genotype-phenotype gap. We note that although these methods help narrow the gap between genotype and phenotype relationships, these approaches alone cannot provide the precise details of underlying mechanisms and current research is still far from closing the gap.
Collapse
Affiliation(s)
- Yoo-Ah Kim
- National Center for Biotechnology Information, National Institutes of Health, National Library of Medicine Bethesda, MD, USA
| | | |
Collapse
|
476
|
Ou-Yang L, Dai DQ, Zhang XF. Protein complex detection via weighted ensemble clustering based on Bayesian nonnegative matrix factorization. PLoS One 2013; 8:e62158. [PMID: 23658709 PMCID: PMC3642239 DOI: 10.1371/journal.pone.0062158] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2012] [Accepted: 03/18/2013] [Indexed: 12/05/2022] Open
Abstract
Detecting protein complexes from protein-protein interaction (PPI) networks is a challenging task in computational biology. A vast number of computational methods have been proposed to undertake this task. However, each computational method is developed to capture one aspect of the network. The performance of different methods on the same network can differ substantially, even the same method may have different performance on networks with different topological characteristic. The clustering result of each computational method can be regarded as a feature that describes the PPI network from one aspect. It is therefore desirable to utilize these features to produce a more accurate and reliable clustering. In this paper, a novel Bayesian Nonnegative Matrix Factorization (NMF)-based weighted Ensemble Clustering algorithm (EC-BNMF) is proposed to detect protein complexes from PPI networks. We first apply different computational algorithms on a PPI network to generate some base clustering results. Then we integrate these base clustering results into an ensemble PPI network, in the form of weighted combination. Finally, we identify overlapping protein complexes from this network by employing Bayesian NMF model. When generating an ensemble PPI network, EC-BNMF can automatically optimize the values of weights such that the ensemble algorithm can deliver better results. Experimental results on four PPI networks of Saccharomyces cerevisiae well verify the effectiveness of EC-BNMF in detecting protein complexes. EC-BNMF provides an effective way to integrate different clustering results for more accurate and reliable complex detection. Furthermore, EC-BNMF has a high degree of flexibility in the choice of base clustering results. It can be coupled with existing clustering methods to identify protein complexes.
Collapse
Affiliation(s)
- Le Ou-Yang
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
| | - Dao-Qing Dai
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
| | - Xiao-Fei Zhang
- Center for Computer Vision and Department of Mathematics, Sun Yat-Sen University, Guangzhou, China
| |
Collapse
|
477
|
Singh-Blom UM, Natarajan N, Tewari A, Woods JO, Dhillon IS, Marcotte EM. Prediction and validation of gene-disease associations using methods inspired by social network analyses. PLoS One 2013; 8:e58977. [PMID: 23650495 PMCID: PMC3641094 DOI: 10.1371/journal.pone.0058977] [Citation(s) in RCA: 78] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2012] [Accepted: 02/12/2013] [Indexed: 11/30/2022] Open
Abstract
Correctly identifying associations of genes with diseases has long been a goal in biology. With the emergence of large-scale gene-phenotype association datasets in biology, we can leverage statistical and machine learning methods to help us achieve this goal. In this paper, we present two methods for predicting gene-disease associations based on functional gene associations and gene-phenotype associations in model organisms. The first method, the Katz measure, is motivated from its success in social network link prediction, and is very closely related to some of the recent methods proposed for gene-disease association inference. The second method, called Catapult (Combining dATa Across species using Positive-Unlabeled Learning Techniques), is a supervised machine learning method that uses a biased support vector machine where the features are derived from walks in a heterogeneous gene-trait network. We study the performance of the proposed methods and related state-of-the-art methods using two different evaluation strategies, on two distinct data sets, namely OMIM phenotypes and drug-target interactions. Finally, by measuring the performance of the methods using two different evaluation strategies, we show that even though both methods perform very well, the Katz measure is better at identifying associations between traits and poorly studied genes, whereas Catapult is better suited to correctly identifying gene-trait associations overall. The authors want to thank Jon Laurent and Kris McGary for some of the data used, and Li and Patra for making their code available. Most of Ambuj Tewari's contribution to this work happened while he was a postdoctoral fellow at the University of Texas at Austin.
Collapse
Affiliation(s)
- U. Martin Singh-Blom
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America
- Department of Medicine, Karolinska Institutet, Solna, Stockholm, Sweden
| | - Nagarajan Natarajan
- Department of Computer Science. University of Texas, Austin, Texas, United States of America
| | - Ambuj Tewari
- Department of Statistics. University of Michigan, Ann Arbor, Michigan, United States of America
| | - John O. Woods
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America
| | - Inderjit S. Dhillon
- Department of Computer Science. University of Texas, Austin, Texas, United States of America
- * E-mail: (EMM); (ISD)
| | - Edward M. Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, Texas, United States of America
- Department of Chemistry and Biochemistry. University of Texas, Austin, Texas, United States of America
- * E-mail: (EMM); (ISD)
| |
Collapse
|
478
|
Abstract
Disease-causing aberrations in the normal function of a gene define that gene as a disease gene. Proving a causal link between a gene and a disease experimentally is expensive and time-consuming. Comprehensive prioritization of candidate genes prior to experimental testing drastically reduces the associated costs. Computational gene prioritization is based on various pieces of correlative evidence that associate each gene with the given disease and suggest possible causal links. A fair amount of this evidence comes from high-throughput experimentation. Thus, well-developed methods are necessary to reliably deal with the quantity of information at hand. Existing gene prioritization techniques already significantly improve the outcomes of targeted experimental studies. Faster and more reliable techniques that account for novel data types are necessary for the development of new diagnostics, treatments, and cure for many diseases.
Collapse
Affiliation(s)
- Yana Bromberg
- Department of Biochemistry and Microbiology, School of Environmental and Biological Sciences, Rutgers University, New Brunswick, New Jersey, USA.
| |
Collapse
|
479
|
Zhu J, Qin Y, Liu T, Wang J, Zheng X. Prioritization of candidate disease genes by topological similarity between disease and protein diffusion profiles. BMC Bioinformatics 2013; 14 Suppl 5:S5. [PMID: 23734762 PMCID: PMC3622672 DOI: 10.1186/1471-2105-14-s5-s5] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Identification of gene-phenotype relationships is a fundamental challenge in human health clinic. Based on the observation that genes causing the same or similar phenotypes tend to correlate with each other in the protein-protein interaction network, a lot of network-based approaches were proposed based on different underlying models. A recent comparative study showed that diffusion-based methods achieve the state-of-the-art predictive performance. RESULTS In this paper, a new diffusion-based method was proposed to prioritize candidate disease genes. Diffusion profile of a disease was defined as the stationary distribution of candidate genes given a random walk with restart where similarities between phenotypes are incorporated. Then, candidate disease genes are prioritized by comparing their diffusion profiles with that of the disease. Finally, the effectiveness of our method was demonstrated through the leave-one-out cross-validation against control genes from artificial linkage intervals and randomly chosen genes. Comparative study showed that our method achieves improved performance compared to some classical diffusion-based methods. To further illustrate our method, we used our algorithm to predict new causing genes of 16 multifactorial diseases including Prostate cancer and Alzheimer's disease, and the top predictions were in good consistent with literature reports. CONCLUSIONS Our study indicates that integration of multiple information sources, especially the phenotype similarity profile data, and introduction of global similarity measure between disease and gene diffusion profiles are helpful for prioritizing candidate disease genes. AVAILABILITY Programs and data are available upon request.
Collapse
Affiliation(s)
- Jie Zhu
- Department of Mathematics, Shanghai Normal University, Shanghai, China
| | | | | | | | | |
Collapse
|
480
|
Emig D, Ivliev A, Pustovalova O, Lancashire L, Bureeva S, Nikolsky Y, Bessarabova M. Drug target prediction and repositioning using an integrated network-based approach. PLoS One 2013; 8:e60618. [PMID: 23593264 PMCID: PMC3617101 DOI: 10.1371/journal.pone.0060618] [Citation(s) in RCA: 143] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/11/2013] [Accepted: 02/28/2013] [Indexed: 11/18/2022] Open
Abstract
The discovery of novel drug targets is a significant challenge in drug development. Although the human genome comprises approximately 30,000 genes, proteins encoded by fewer than 400 are used as drug targets in the treatment of diseases. Therefore, novel drug targets are extremely valuable as the source for first in class drugs. On the other hand, many of the currently known drug targets are functionally pleiotropic and involved in multiple pathologies. Several of them are exploited for treating multiple diseases, which highlights the need for methods to reliably reposition drug targets to new indications. Network-based methods have been successfully applied to prioritize novel disease-associated genes. In recent years, several such algorithms have been developed, some focusing on local network properties only, and others taking the complete network topology into account. Common to all approaches is the understanding that novel disease-associated candidates are in close overall proximity to known disease genes. However, the relevance of these methods to the prediction of novel drug targets has not yet been assessed. Here, we present a network-based approach for the prediction of drug targets for a given disease. The method allows both repositioning drug targets known for other diseases to the given disease and the prediction of unexploited drug targets which are not used for treatment of any disease. Our approach takes as input a disease gene expression signature and a high-quality interaction network and outputs a prioritized list of drug targets. We demonstrate the high performance of our method and highlight the usefulness of the predictions in three case studies. We present novel drug targets for scleroderma and different types of cancer with their underlying biological processes. Furthermore, we demonstrate the ability of our method to identify non-suspected repositioning candidates using diabetes type 1 as an example.
Collapse
Affiliation(s)
- Dorothea Emig
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
| | - Alexander Ivliev
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
| | - Olga Pustovalova
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
| | - Lee Lancashire
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
| | - Svetlana Bureeva
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
| | - Yuri Nikolsky
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
| | - Marina Bessarabova
- IP & Science, Thomson Reuters, Carlsbad, California, United States of America
- * E-mail:
| |
Collapse
|
481
|
Abstract
Protein complexes are a cornerstone of many biological processes and, together, they form various types of molecular machinery that perform a vast array of biological functions. Different complexes perform different functions and, the same complex can perform very different functions that depend on a variety of factors. Thus disruption of protein complexes can be lethal to an organism. It is interesting to identify a minimal set of proteins whose removal would lead to a massive disruption of protein complexes and, to understand the biological properties of these proteins. A method is presented for identifying a minimum number of proteins from a given set of complexes so that a maximum number of these complexes are disrupted when these proteins are removed. The method is based on spectral bipartitioning. This method is applied to yeast protein complexes. The identified proteins participate in a large number of biological processes and functional modules. A large proportion of them are essential proteins. Moreover, removing these identified proteins causes a large number of the yeast protein complexes to break into two fragments of nearly equal size, which minimizes the chance of either fragment being functional. The method is also superior in these aspects to alternative methods based on proteins with high connection degree, proteins whose neighbors have high average degree, and proteins that connect to lots of proteins of high connection degree. Our spectral bipartitioning method is able to efficiently identify a biologically meaningful minimal set of proteins whose removal causes a massive disruption of protein complexes in an organism.
Collapse
Affiliation(s)
- Golnaz Taheri
- School of Mathematics and Computer Sciences, College of Science, University of Tehran, Tehran, Iran
| | | | | | | |
Collapse
|
482
|
Broderick G, Craddock TJA. Systems biology of complex symptom profiles: capturing interactivity across behavior, brain and immune regulation. Brain Behav Immun 2013; 29:1-8. [PMID: 23022717 PMCID: PMC3554865 DOI: 10.1016/j.bbi.2012.09.008] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/21/2012] [Revised: 09/13/2012] [Accepted: 09/14/2012] [Indexed: 12/15/2022] Open
Abstract
As our thinking about the basic principles of biology and medicine continue to evolve, the importance of context and regulatory interaction is becoming increasingly obvious. Biochemical and physiological components do not exist in isolation but instead are part of a tightly integrated network of interacting elements that ensure robustness and support the emergence of complex behavior. This integration permeates all levels of biology from gene regulation, to immune cell signaling, to coordinated patterns of neuronal activity and the resulting psychosocial interaction. Systems biology is an emerging branch of science that sits as a translational catalyst at the interface of the life and computational sciences. While there is no universally accepted definition of systems biology, we attempt to provide an overview of some the basic unifying concepts and current efforts in the field as they apply to illnesses where brain and subsequent behavior are a chief component, for example autism, schizophrenia, depression, and others. Methods in this field currently constitute a broad mosaic that stretches across multiple scales of biology and physiological compartments. While this work by no means constitutes an exhaustive list of all these methods, this work highlights the principal sub-disciplines presently driving the field as well as future directions of progress.
Collapse
Affiliation(s)
- Gordon Broderick
- Department of Medicine, University of Alberta, Edmonton, Canada.
| | | |
Collapse
|
483
|
Poirel CL, Rahman A, Rodrigues RR, Krishnan A, Addesa JR, Murali TM. Reconciling differential gene expression data with molecular interaction networks. ACTA ACUST UNITED AC 2013; 29:622-9. [PMID: 23314326 DOI: 10.1093/bioinformatics/btt007] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
MOTIVATION Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. RESULTS To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression P-values such that two genes connected by an interaction show similar changes in their gene expression values. We provide a novel evaluation of four methods from this class of algorithms. We enumerate three desirable properties that this class of algorithms should address. These properties seek to maintain that the returned gene rankings are specific to the condition being studied. Moreover, to ease interpretation, highly ranked genes should participate in coherent network structures and should be functionally enriched with relevant biological pathways. We comprehensively evaluate the extent to which each algorithm addresses these properties on a compendium of gene expression data for 54 diverse human diseases. We show that the reconciled gene rankings can identify novel disease-related functions that are missed by analyzing expression data alone. AVAILABILITY C++ software implementing our algorithms is available in the NetworkReconciliation package as part of the Biorithm software suite under the GNU General Public License: http://bioinformatics.cs.vt.edu/∼murali/software/biorithm-docs.
Collapse
|
484
|
Abstract
Large amounts of protein-protein interaction (PPI) data are available. The human PPI network currently contains over 56 000 interactions between 11 100 proteins. It has been demonstrated that the structure of this network is not random and that the same wiring patterns in it underlie the same biological processes and diseases. In this paper, we ask if there exists a subnetwork of the human PPI network such that its topology is the key to disease formation and hence should be the primary object of therapeutic intervention. We demonstrate that such a subnetwork exists and can be obtained purely computationally. In particular, by successively pruning the entire human PPI network, we are left with a "core" subnetwork that is not only topologically and functionally homogeneous, but is also enriched in disease genes, drug targets, and it contains genes that are known to drive disease formation. We call this subnetwork the Core Diseasome. Furthermore, we show that the topology of the Core Diseasome is unique in the human PPI network suggesting that it may be the wiring of this network that governs the mutagenesis that leads to disease. Explaining the mechanisms behind this phenomenon and exploiting them remains a challenge.
Collapse
Affiliation(s)
- Vuk Janjić
- Department of Computing, Imperial College London, London, SW7 2AZ, UK.
| | | |
Collapse
|
485
|
Li ZC, Lai YH, Chen LL, Chen C, Xie Y, Dai Z, Zou XY. Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. MOLECULAR BIOSYSTEMS 2013; 9:658-67. [DOI: 10.1039/c3mb25451h] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
|
486
|
Wang W, Yang S, Li JING. Drug target predictions based on heterogeneous graph inference. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2013:53-64. [PMID: 23424111 PMCID: PMC3605000] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/01/2023]
Abstract
A key issue in drug development is to understand the hidden relationships among drugs and targets. Computational methods for novel drug target predictions can greatly reduce time and costs compared with experimental methods. In this paper, we propose a network based computational approach for novel drug and target association predictions. More specifically, a heterogeneous drug-target graph, which incorporates known drug-target interactions as well as drug-drug and target-target similarities, is first constructed. Based on this graph, a novel graph-based inference method is introduced. Compared with two state-of-the-art methods, large-scale cross-validation results indicate that the proposed method can greatly improve novel target predictions.
Collapse
Affiliation(s)
| | | | - JING Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, Ohio, 44106, USA
| |
Collapse
|
487
|
Kim KJ, Hwang D, Kim WU. Systems Approach to Rheumatoid Arthritis. JOURNAL OF RHEUMATIC DISEASES 2013. [DOI: 10.4078/jrd.2013.20.6.348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Affiliation(s)
- Ki-Jo Kim
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, The Catholic University of Korea, Suwon, Korea
| | - Daehee Hwang
- Center for Systems Biology of Plant Senescence and Life History, Daegu Gyeongbuk Institute of Science & Technology, Daegu, Korea
| | - Wan-Uk Kim
- Division of Rheumatology, Department of Internal Medicine, St. Vincent's Hospital, The Catholic University of Korea, Suwon, Korea
| |
Collapse
|
488
|
'Omics' approaches to understanding interstitial cystitis/painful bladder syndrome/bladder pain syndrome. Int Neurourol J 2012; 16:159-68. [PMID: 23346481 PMCID: PMC3547176 DOI: 10.5213/inj.2012.16.4.159] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2012] [Accepted: 12/18/2012] [Indexed: 11/08/2022] Open
Abstract
Recent efforts in the generation of large genomics, transcriptomics, proteomics, metabolomics and other types of 'omics' data sets have provided an unprecedentedly detailed view of certain diseases, however to date most of this literature has been focused on malignancy and other lethal pathological conditions. Very little intensive work on global profiles has been performed to understand the molecular mechanism of interstitial cystitis/painful bladder syndrome/bladder pain syndrome (IC/PBS/BPS), a chronic lower urinary tract disorder characterized by pelvic pain, urinary urgency and frequency, which can lead to long lasting adverse effects on quality of life. A lack of understanding of molecular mechanism has been a challenge and dilemma for diagnosis and treatment, and has also led to a delay in basic and translational research focused on biomarker and drug discovery, clinical therapy, and preventive strategies against IC/PBS/BPS. This review describes the current state of 'omics' studies and available data sets relevant to IC/PBS/BPS, and presents opportunities for new research directed at understanding the pathogenesis of this complex condition.
Collapse
|
489
|
Abstract
Complex diseases are caused by a combination of genetic and environmental factors. Uncovering the molecular pathways through which genetic factors affect a phenotype is always difficult, but in the case of complex diseases this is further complicated since genetic factors in affected individuals might be different. In recent years, systems biology approaches and, more specifically, network based approaches emerged as powerful tools for studying complex diseases. These approaches are often built on the knowledge of physical or functional interactions between molecules which are usually represented as an interaction network. An interaction network not only reports the binary relationships between individual nodes but also encodes hidden higher level organization of cellular communication. Computational biologists were challenged with the task of uncovering this organization and utilizing it for the understanding of disease complexity, which prompted rich and diverse algorithmic approaches to be proposed. We start this chapter with a description of the general characteristics of complex diseases followed by a brief introduction to physical and functional networks. Next we will show how these networks are used to leverage genotype, gene expression, and other types of data to identify dysregulated pathways, infer the relationships between genotype and phenotype, and explain disease heterogeneity. We group the methods by common underlying principles and first provide a high level description of the principles followed by more specific examples. We hope that this chapter will give readers an appreciation for the wealth of algorithmic techniques that have been developed for the purpose of studying complex diseases as well as insight into their strengths and limitations.
Collapse
Affiliation(s)
- Dong-Yeon Cho
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Yoo-Ah Kim
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
| | - Teresa M. Przytycka
- National Center for Biotechnology Information, NLM, NIH, Bethesda, Maryland, United States of America
- * E-mail:
| |
Collapse
|
490
|
Zhu C, Kushwaha A, Berman K, Jegga AG. A vertex similarity-based framework to discover and rank orphan disease-related genes. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 3:S8. [PMID: 23281592 PMCID: PMC3524320 DOI: 10.1186/1752-0509-6-s3-s8] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/28/2022]
Abstract
Background A rare or orphan disease (OD) is any disease that affects a small percentage of the population. While opportunities now exist to accelerate progress toward understanding the basis for many more ODs, the prioritization of candidate genes is still a critical step for disease-gene identification. Several network-based frameworks have been developed to address this problem with varied results. Result We have developed a novel vertex similarity (VS) based parameter-free prioritizing framework to identify and rank orphan disease candidate genes. We validate our approach by using 1598 known orphan disease-causing genes (ODGs) representing 172 orphan diseases (ODs). We compare our approach with a state-of-art parameter-based approach (PageRank with Priors or PRP) and with another parameter-free method (Interconnectedness or ICN). Our results show that VS-based approach outperforms ICN and is comparable to PRP. We further apply VS-based ranking to identify and rank potential novel candidate genes for several ODs. Conclusion We demonstrate that VS-based parameter-free ranking approach can be successfully used for disease candidate gene prioritization and can complement other network-based methods for candidate disease gene ranking. Importantly, our VS-ranked top candidate genes for the ODs match the known literature, suggesting several novel causal relationships for further investigation.
Collapse
Affiliation(s)
- Cheng Zhu
- Department of Computer Science, University of Cincinnati, Cincinnati, Ohio 45229, USA
| | | | | | | |
Collapse
|
491
|
Gonçalves JP, Francisco AP, Moreau Y, Madeira SC. Interactogeneous: disease gene prioritization using heterogeneous networks and full topology scores. PLoS One 2012. [PMID: 23185389 PMCID: PMC3501465 DOI: 10.1371/journal.pone.0049634] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
Abstract
Disease gene prioritization aims to suggest potential implications of genes in disease susceptibility. Often accomplished in a guilt-by-association scheme, promising candidates are sorted according to their relatedness to known disease genes. Network-based methods have been successfully exploiting this concept by capturing the interaction of genes or proteins into a score. Nonetheless, most current approaches yield at least some of the following limitations: (1) networks comprise only curated physical interactions leading to poor genome coverage and density, and bias toward a particular source; (2) scores focus on adjacencies (direct links) or the most direct paths (shortest paths) within a constrained neighborhood around the disease genes, ignoring potentially informative indirect paths; (3) global clustering is widely applied to partition the network in an unsupervised manner, attributing little importance to prior knowledge; (4) confidence weights and their contribution to edge differentiation and ranking reliability are often disregarded. We hypothesize that network-based prioritization related to local clustering on graphs and considering full topology of weighted gene association networks integrating heterogeneous sources should overcome the above challenges. We term such a strategy Interactogeneous. We conducted cross-validation tests to assess the impact of network sources, alternative path inclusion and confidence weights on the prioritization of putative genes for 29 diseases. Heat diffusion ranking proved the best prioritization method overall, increasing the gap to neighborhood and shortest paths scores mostly on single source networks. Heterogeneous associations consistently delivered superior performance over single source data across the majority of methods. Results on the contribution of confidence weights were inconclusive. Finally, the best Interactogeneous strategy, heat diffusion ranking and associations from the STRING database, was used to prioritize genes for Parkinson’s disease. This method effectively recovered known genes and uncovered interesting candidates which could be linked to pathogenic mechanisms of the disease.
Collapse
Affiliation(s)
- Joana P. Gonçalves
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| | - Alexandre P. Francisco
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
| | - Yves Moreau
- Electrical Engineering Department, Katholieke Universiteit Leuven, Leuven, Belgium
| | - Sara C. Madeira
- Knowledge Discovery and Bioinformatics Group, INESC-ID, Lisbon, Portugal
- Computer Science and Engineering Department, Instituto Superior Técnico, Technical University of Lisbon, Lisbon, Portugal
- * E-mail: (JPG); (SCM)
| |
Collapse
|
492
|
Guo X, Gao L, Liao Q, Xiao H, Ma X, Yang X, Luo H, Zhao G, Bu D, Jiao F, Shao Q, Chen R, Zhao Y. Long non-coding RNAs function annotation: a global prediction method based on bi-colored networks. Nucleic Acids Res 2012; 41:e35. [PMID: 23132350 PMCID: PMC3554231 DOI: 10.1093/nar/gks967] [Citation(s) in RCA: 142] [Impact Index Per Article: 11.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/01/2023] Open
Abstract
More and more evidences demonstrate that the long non-coding RNAs (lncRNAs) play many key
roles in diverse biological processes. There is a critical need to annotate the functions
of increasing available lncRNAs. In this article, we try to apply a global network-based
strategy to tackle this issue for the first time. We develop a bi-colored network based
global function predictor, long non-coding RNA global function predictor
(‘lnc-GFP’), to predict probable functions for lncRNAs at large scale by
integrating gene expression data and protein interaction data. The performance of lnc-GFP
is evaluated on protein-coding and lncRNA genes. Cross-validation tests on protein-coding
genes with known function annotations indicate that our method can achieve a precision up
to 95%, with a suitable parameter setting. Among the 1713 lncRNAs in the bi-colored
network, the 1625 (94.9%) lncRNAs in the maximum connected component are all
functionally characterized. For the lncRNAs expressed in mouse embryo stem cells and
neuronal cells, the inferred putative functions by our method highly match those in the
known literature.
Collapse
Affiliation(s)
- Xingli Guo
- School of computer science and technology, Xidian University, 2 South Taibai Road, Xi'an Shaanxi, 710071, PR China
| | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
493
|
Abstract
Molecular network data are increasingly becoming available, necessitating the development of well performing computational tools for their analyses. Such tools enabled conceptually different approaches for exploring human diseases to be undertaken, in particular, those that study the relationship between a multitude of biomolecules within a cell. Hence, a new field of network biology has emerged as part of systems biology, aiming to untangle the complexity of cellular network organization. We survey current network analysis methods that aim to give insight into human disease.
Collapse
Affiliation(s)
- Vuk Janjić
- Department of Computing, Imperial College London, 180 Queen's Gate, SW7 2AZ London, UK
| | | |
Collapse
|
494
|
Nusinow DP, Kiezun A, O'Connell DJ, Chick JM, Yue Y, Maas RL, Gygi SP, Sunyaev SR. Network-based inference from complex proteomic mixtures using SNIPE. Bioinformatics 2012; 28:3115-22. [PMID: 23060611 DOI: 10.1093/bioinformatics/bts594] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Proteomics presents the opportunity to provide novel insights about the global biochemical state of a tissue. However, a significant problem with current methods is that shotgun proteomics has limited success at detecting many low abundance proteins, such as transcription factors from complex mixtures of cells and tissues. The ability to assay for these proteins in the context of the entire proteome would be useful in many areas of experimental biology. RESULTS We used network-based inference in an approach named SNIPE (Software for Network Inference of Proteomics Experiments) that selectively highlights proteins that are more likely to be active but are otherwise undetectable in a shotgun proteomic sample. SNIPE integrates spectral counts from paired case-control samples over a network neighbourhood and assesses the statistical likelihood of enrichment by a permutation test. As an initial application, SNIPE was able to select several proteins required for early murine tooth development. Multiple lines of additional experimental evidence confirm that SNIPE can uncover previously unreported transcription factors in this system. We conclude that SNIPE can enhance the utility of shotgun proteomics data to facilitate the study of poorly detected proteins in complex mixtures. AVAILABILITY AND IMPLEMENTATION An implementation for the R statistical computing environment named snipeR has been made freely available at http://genetics.bwh.harvard.edu/snipe/. CONTACT ssunyaev@rics.bwh.harvard.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- David P Nusinow
- Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA
| | | | | | | | | | | | | | | |
Collapse
|
495
|
Aluru M, Zola J, Nettleton D, Aluru S. Reverse engineering and analysis of large genome-scale gene networks. Nucleic Acids Res 2012; 41:e24. [PMID: 23042249 PMCID: PMC3592423 DOI: 10.1093/nar/gks904] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
Reverse engineering the whole-genome networks of complex multicellular organisms continues to remain a challenge. While simpler models easily scale to large number of genes and gene expression datasets, more accurate models are compute intensive limiting their scale of applicability. To enable fast and accurate reconstruction of large networks, we developed Tool for Inferring Network of Genes (TINGe), a parallel mutual information (MI)-based program. The novel features of our approach include: (i) B-spline-based formulation for linear-time computation of MI, (ii) a novel algorithm for direct permutation testing and (iii) development of parallel algorithms to reduce run-time and facilitate construction of large networks. We assess the quality of our method by comparison with ARACNe (Algorithm for the Reconstruction of Accurate Cellular Networks) and GeneNet and demonstrate its unique capability by reverse engineering the whole-genome network of Arabidopsis thaliana from 3137 Affymetrix ATH1 GeneChips in just 9 min on a 1024-core cluster. We further report on the development of a new software Gene Network Analyzer (GeNA) for extracting context-specific subnetworks from a given set of seed genes. Using TINGe and GeNA, we performed analysis of 241 Arabidopsis AraCyc 8.0 pathways, and the results are made available through the web.
Collapse
Affiliation(s)
- Maneesha Aluru
- Department of Genetics, Iowa State University, Ames, IA 50011, USA.
| | | | | | | |
Collapse
|
496
|
Abstract
The molecular pathways that govern human disease consist of molecular circuits that coalesce into complex, overlapping networks. These network pathways are presumably regulated in a coordinated fashion, but such regulation has been difficult to decipher using only reductionistic principles. The emerging paradigm of "network medicine" proposes to utilize insights garnered from network topology (eg, the static position of molecules in relation to their neighbors) as well as network dynamics (eg, the unique flux of information through the network) to understand better the pathogenic behavior of complex molecular interconnections that traditional methods fail to recognize. As methodologies evolve, network medicine has the potential to capture the molecular complexity of human disease while offering computational methods to discern how such complexity controls disease manifestations, prognosis, and therapy. This review introduces the fundamental concepts of network medicine and explores the feasibility and potential impact of network-based methods for predicting individual manifestations of human disease and designing rational therapies. Wherever possible, we emphasize the application of these principles to cardiovascular disease.
Collapse
Affiliation(s)
- Stephen Y Chan
- Division of Cardiovascular Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | | |
Collapse
|
497
|
Magger O, Waldman YY, Ruppin E, Sharan R. Enhancing the prioritization of disease-causing genes through tissue specific protein interaction networks. PLoS Comput Biol 2012; 8:e1002690. [PMID: 23028288 PMCID: PMC3459874 DOI: 10.1371/journal.pcbi.1002690] [Citation(s) in RCA: 110] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2011] [Accepted: 07/28/2012] [Indexed: 01/07/2023] Open
Abstract
The prioritization of candidate disease-causing genes is a fundamental challenge in the post-genomic era. Current state of the art methods exploit a protein-protein interaction (PPI) network for this task. They are based on the observation that genes causing phenotypically-similar diseases tend to lie close to one another in a PPI network. However, to date, these methods have used a static picture of human PPIs, while diseases impact specific tissues in which the PPI networks may be dramatically different. Here, for the first time, we perform a large-scale assessment of the contribution of tissue-specific information to gene prioritization. By integrating tissue-specific gene expression data with PPI information, we construct tissue-specific PPI networks for 60 tissues and investigate their prioritization power. We find that tissue-specific PPI networks considerably improve the prioritization results compared to those obtained using a generic PPI network. Furthermore, they allow predicting novel disease-tissue associations, pointing to sub-clinical tissue effects that may escape early detection.
Collapse
Affiliation(s)
- Oded Magger
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | | | |
Collapse
|
498
|
Guney E, Oliva B. Exploiting protein-protein interaction networks for genome-wide disease-gene prioritization. PLoS One 2012; 7:e43557. [PMID: 23028459 PMCID: PMC3448640 DOI: 10.1371/journal.pone.0043557] [Citation(s) in RCA: 78] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2012] [Accepted: 07/23/2012] [Indexed: 11/23/2022] Open
Abstract
Complex genetic disorders often involve products of multiple genes acting cooperatively. Hence, the pathophenotype is the outcome of the perturbations in the underlying pathways, where gene products cooperate through various mechanisms such as protein-protein interactions. Pinpointing the decisive elements of such disease pathways is still challenging. Over the last years, computational approaches exploiting interaction network topology have been successfully applied to prioritize individual genes involved in diseases. Although linkage intervals provide a list of disease-gene candidates, recent genome-wide studies demonstrate that genes not associated with any known linkage interval may also contribute to the disease phenotype. Network based prioritization methods help highlighting such associations. Still, there is a need for robust methods that capture the interplay among disease-associated genes mediated by the topology of the network. Here, we propose a genome-wide network-based prioritization framework named GUILD. This framework implements four network-based disease-gene prioritization algorithms. We analyze the performance of these algorithms in dozens of disease phenotypes. The algorithms in GUILD are compared to state-of-the-art network topology based algorithms for prioritization of genes. As a proof of principle, we investigate top-ranking genes in Alzheimer's disease (AD), diabetes and AIDS using disease-gene associations from various sources. We show that GUILD is able to significantly highlight disease-gene associations that are not used a priori. Our findings suggest that GUILD helps to identify genes implicated in the pathology of human disorders independent of the loci associated with the disorders.
Collapse
Affiliation(s)
- Emre Guney
- Structural Bioinformatics Group (GRIB), Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
| | - Baldo Oliva
- Structural Bioinformatics Group (GRIB), Universitat Pompeu Fabra, Barcelona Research Park of Biomedicine (PRBB), Barcelona, Catalonia, Spain
- * E-mail:
| |
Collapse
|
499
|
Solava RW, Michaels RP, Milenkovic T. Graphlet-based edge clustering reveals pathogen-interacting proteins. Bioinformatics 2012; 28:i480-i486. [PMID: 22962470 PMCID: PMC3436803 DOI: 10.1093/bioinformatics/bts376] [Citation(s) in RCA: 50] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
MOTIVATION Prediction of protein function from protein interaction networks has received attention in the post-genomic era. A popular strategy has been to cluster the network into functionally coherent groups of proteins and assign the entire cluster with a function based on functions of its annotated members. Traditionally, network research has focused on clustering of nodes. However, clustering of edges may be preferred: nodes belong to multiple functional groups, but clustering of nodes typically cannot capture the group overlap, while clustering of edges can. Clustering of adjacent edges that share many neighbors was proposed recently, outperforming different node clustering methods. However, since some biological processes can have characteristic 'signatures' throughout the network, not just locally, it may be of interest to consider edges that are not necessarily adjacent. RESULTS We design a sensitive measure of the 'topological similarity' of edges that can deal with edges that are not necessarily adjacent. We cluster edges that are similar according to our measure in different baker's yeast protein interaction networks, outperforming existing node and edge clustering approaches. We apply our approach to the human network to predict new pathogen-interacting proteins. This is important, since these proteins represent drug target candidates. AVAILABILITY Software executables are freely available upon request. CONTACT tmilenko@nd.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- R W Solava
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA
| | | | | |
Collapse
|
500
|
Abstract
PURPOSE OF REVIEW This review introduces the fundamental concepts of network medicine and explores the feasibility and potential impact of network-based methods on predicting and ameliorating individual manifestations of human cardiovascular disease. RECENT FINDINGS Complex cardiovascular diseases rarely result from an abnormality in a single molecular effector, but, rather, nearly always are the net result of multiple pathobiological pathways that interact through an interconnected network. In the postgenomic era, a framework has emerged of the potential complexity of the interacting pathways that govern molecular actions in the human cell. As a result, network approaches have been developed to understand more comprehensively those interconnections that influence human disease. 'Network medicine' has already led to tangible discoveries of novel disease genes and pathways as well as improved mechanisms for rational drug development. SUMMARY As methodologies evolve, network medicine may better capture the complexity of human pathogenesis and, thus, re-define personalized disease classification and therapies.
Collapse
|