401
|
Ganegoda GU, Li M, Wang W, Feng Q. Heterogeneous Network Model to Infer Human Disease-Long Intergenic Non-Coding RNA Associations. IEEE Trans Nanobioscience 2015; 14:175-83. [DOI: 10.1109/tnb.2015.2391133] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
|
402
|
Lhota J, Hauptman R, Hart T, Ng C, Xie L. A new method to improve network topological similarity search: applied to fold recognition. Bioinformatics 2015; 31:2106-14. [PMID: 25717198 DOI: 10.1093/bioinformatics/btv125] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/08/2014] [Accepted: 02/21/2015] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Similarity search is the foundation of bioinformatics. It plays a key role in establishing structural, functional and evolutionary relationships between biological sequences. Although the power of the similarity search has increased steadily in recent years, a high percentage of sequences remain uncharacterized in the protein universe. Thus, new similarity search strategies are needed to efficiently and reliably infer the structure and function of new sequences. The existing paradigm for studying protein sequence, structure, function and evolution has been established based on the assumption that the protein universe is discrete and hierarchical. Cumulative evidence suggests that the protein universe is continuous. As a result, conventional sequence homology search methods may be not able to detect novel structural, functional and evolutionary relationships between proteins from weak and noisy sequence signals. To overcome the limitations in existing similarity search methods, we propose a new algorithmic framework-Enrichment of Network Topological Similarity (ENTS)-to improve the performance of large scale similarity searches in bioinformatics. RESULTS We apply ENTS to a challenging unsolved problem: protein fold recognition. Our rigorous benchmark studies demonstrate that ENTS considerably outperforms state-of-the-art methods. As the concept of ENTS can be applied to any similarity metric, it may provide a general framework for similarity search on any set of biological entities, given their representation as a network. AVAILABILITY AND IMPLEMENTATION Source code freely available upon request CONTACT : lxie@iscb.org.
Collapse
Affiliation(s)
- John Lhota
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Ruth Hauptman
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Thomas Hart
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Clara Ng
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| | - Lei Xie
- Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A. Hunter College High School, New York, NY 10128, U.S.A., Department of Computer Science, Hunter College, The City University of New York, New York, NY 10065, U.S.A., Department of Biological Sciences, Hunter College, The City University of New York New York, NY 10065, U.S.A. and The Graduate Center, The City University of New York, New York, NY 10016, U.S.A
| |
Collapse
|
403
|
Wu S, Shao F, Ji J, Sun R, Dong R, Zhou Y, Xu S, Sui Y, Hu J. Network propagation with dual flow for gene prioritization. PLoS One 2015; 10:e0116505. [PMID: 25689268 PMCID: PMC4331530 DOI: 10.1371/journal.pone.0116505] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2014] [Accepted: 11/24/2014] [Indexed: 12/31/2022] Open
Abstract
Based on the hypothesis that the neighbors of disease genes trend to cause similar diseases, network-based methods for disease prediction have received increasing attention. Taking full advantage of network structure, the performance of global distance measurements is generally superior to local distance measurements. However, some problems exist in the global distance measurements. For example, global distance measurements may mistake non-disease hub proteins that have dense interactions with known disease proteins for potential disease proteins. To find a new method to avoid the aforementioned problem, we analyzed the differences between disease proteins and other proteins by using essential proteins (proteins encoded by essential genes) as references. We find that disease proteins are not well connected with essential proteins in the protein interaction networks. Based on this new finding, we proposed a novel strategy for gene prioritization based on protein interaction networks. We allocated positive flow to disease genes and negative flow to essential genes, and adopted network propagation for gene prioritization. Experimental results on 110 diseases verified the effectiveness and potential of the proposed method.
Collapse
Affiliation(s)
- Shunyao Wu
- College of Automation Engineering, Qingdao University, Qingdao, China
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Fengjing Shao
- College of Automation Engineering, Qingdao University, Qingdao, China
- College of Information Engineering, Qingdao University, Qingdao, China
- * E-mail:
| | - Jun Ji
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Rencheng Sun
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Rizhuang Dong
- School of Computer Engineering, Qingdao Technological University, Qingdao, China
| | - Yuanke Zhou
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Shaojie Xu
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Yi Sui
- College of Information Engineering, Qingdao University, Qingdao, China
| | - Jianlong Hu
- College of Information Engineering, Qingdao University, Qingdao, China
| |
Collapse
|
404
|
Kuperstein I, Grieco L, Cohen DPA, Thieffry D, Zinovyev A, Barillot E. The shortest path is not the one you know: application of biological network resources in precision oncology research. Mutagenesis 2015; 30:191-204. [DOI: 10.1093/mutage/geu078] [Citation(s) in RCA: 31] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open
|
405
|
Jiang R. Walking on multiple disease-gene networks to prioritize candidate genes. J Mol Cell Biol 2015; 7:214-30. [DOI: 10.1093/jmcb/mjv008] [Citation(s) in RCA: 55] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2014] [Accepted: 01/11/2015] [Indexed: 12/11/2022] Open
|
406
|
Ahmadi Adl A, Qian X. Tumor stratification by a novel graph-regularized bi-clique finding algorithm. Comput Biol Chem 2015; 57:3-11. [PMID: 25791318 DOI: 10.1016/j.compbiolchem.2015.02.010] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2014] [Accepted: 02/03/2015] [Indexed: 12/15/2022]
Abstract
Due to involved disease mechanisms, many complex diseases such as cancer, demonstrate significant heterogeneity with varying behaviors, including different survival time, treatment responses, and recurrence rates. The aim of tumor stratification is to identify disease subtypes, which is an important first step towards precision medicine. Recent advances in profiling a large number of molecular variables such as in The Cancer Genome Atlas (TCGA), have enabled researchers to implement computational methods, including traditional clustering and bi-clustering algorithms, to systematically analyze high-throughput molecular measurements to identify tumor subtypes as well as their corresponding associated biomarkers. In this study we discuss critical issues and challenges in existing computational approaches for tumor stratification. We show that the problem can be formulated as finding densely connected sub-graphs (bi-cliques) in a bipartite graph representation of genomic data. We propose a novel algorithm that takes advantage of prior biology knowledge through a gene-gene interaction network to find such sub-graphs, which helps simultaneously identify both tumor subtypes and their corresponding genetic markers. Our experimental results show that our proposed method outperforms current state-of-the-art methods for tumor stratification.
Collapse
Affiliation(s)
- Amin Ahmadi Adl
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33613, USA.
| | - Xiaoning Qian
- Department of Computer Science and Engineering, University of South Florida, Tampa, FL 33613, USA; Department of Electrical & Computer Engineering, Texas A&M University, College Station, TX 77843, USA; Department of Pediatrics, University of South Florida, Tampa, FL 33620, USA
| |
Collapse
|
407
|
Zhao ZQ, Han GS, Yu ZG, Li J. Laplacian normalization and random walk on heterogeneous networks for disease-gene prioritization. Comput Biol Chem 2015; 57:21-8. [PMID: 25736609 DOI: 10.1016/j.compbiolchem.2015.02.008] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2014] [Accepted: 02/03/2015] [Indexed: 12/11/2022]
Abstract
Random walk on heterogeneous networks is a recently emerging approach to effective disease gene prioritization. Laplacian normalization is a technique capable of normalizing the weight of edges in a network. We use this technique to normalize the gene matrix and the phenotype matrix before the construction of the heterogeneous network, and also use this idea to define the transition matrices of the heterogeneous network. Our method has remarkably better performance than the existing methods for recovering known gene-phenotype relationships. The Shannon information entropy of the distribution of the transition probabilities in our networks is found to be smaller than the networks constructed by the existing methods, implying that a higher number of top-ranked genes can be verified as disease genes. In fact, the most probable gene-phenotype relationships ranked within top 3 or top 5 in our gene lists can be confirmed by the OMIM database for many cases. Our algorithms have shown remarkably superior performance over the state-of-the-art algorithms for recovering gene-phenotype relationships. All Matlab codes can be available upon email request.
Collapse
Affiliation(s)
- Zhi-Qin Zhao
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Guo-Sheng Han
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China
| | - Zu-Guo Yu
- Hunan Key Laboratory for Computation and Simulation in Science and Engineering and Key Laboratory of Intelligent Computing and Information Processing of Ministry of Education, Xiangtan University, Xiangtan, Hunan 411105, China; School of Mathematical Sciences, Queensland University of Technology, GPO Box 2434, Brisbane Q4001, Australia.
| | - Jinyan Li
- Advanced Analytics Institute & Centre for Health Technologies, University of Technology Sydney, Broadway, NSW 2007, Australia.
| |
Collapse
|
408
|
Characterization of protein complexes and subcomplexes in protein-protein interaction databases. Biochem Res Int 2015; 2015:245075. [PMID: 25722891 PMCID: PMC4334629 DOI: 10.1155/2015/245075] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 01/05/2015] [Accepted: 01/06/2015] [Indexed: 12/24/2022] Open
Abstract
The identification and characterization of protein complexes implicated in protein-protein interaction data are crucial to the understanding of the molecular events under normal and abnormal physiological conditions. This paper provides a novel characterization of subcomplexes in protein interaction databases, stressing definition and representation issues, quantification, biological validation, network metrics, motifs, modularity, and gene ontology (GO) terms. The paper introduces the concept of "nested group" as a way to represent subcomplexes and estimates that around 15% of those nested group with the higher Jaccard index may be a result of data artifacts in protein interaction databases, while a number of them can be found in biologically important modular structures or dynamic structures. We also found that network centralities, enrichment in essential proteins, GO terms related to regulation, imperfect 5-clique motifs, and higher GO homogeneity can be used to identify proteins in nested complexes.
Collapse
|
409
|
Wu L, Shen Y, Li M, Wu FX. Network output controllability-based method for drug target identification. IEEE Trans Nanobioscience 2015; 14:184-91. [PMID: 25643411 DOI: 10.1109/tnb.2015.2391175] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Biomolecules do not perform their functions alone, but interactively with one another to form so called biomolecular networks. It is well known that a complex disease stems from the malfunctions of corresponding biomolecular networks. Therefore, one of important tasks is to identify drug targets from biomolecular networks. In this study, the drug target identification is formulated as a problem of finding steering nodes in biomolecular networks while the concept of network output controllability is applied to the problem of drug target identification. By applying control signals to these steering nodes, the biomolecular networks are expected to be transited from one state to another. A graph-theoretic algorithm has been proposed to find a minimum set of steering nodes in biomolecular networks which can be a potential set of drug targets. Application results of the method to real biomolecular networks show that identified potential drug targets are in agreement with existing research results. This indicates that the method can generate testable predictions and provide insights into experimental design of drug discovery.
Collapse
|
410
|
Abstract
Background Pinpointing genes involved in inherited human diseases remains a great challenge in the post-genomics era. Although approaches have been proposed either based on the guilt-by-association principle or making use of disease phenotype similarities, the low coverage of both diseases and genes in existing methods has been preventing the scan of causative genes for a significant proportion of diseases at the whole-genome level. Results To overcome this limitation, we proposed a rigorous statistical method called pgFusion to prioritize candidate genes by integrating one type of disease phenotype similarity derived from the Unified Medical Language System (UMLS) and seven types of gene functional similarities calculated from gene expression, gene ontology, pathway membership, protein sequence, protein domain, protein-protein interaction and regulation pattern, respectively. Our method covered a total of 7,719 diseases and 20,327 genes, achieving the highest coverage thus far for both diseases and genes. We performed leave-one-out cross-validation experiments to demonstrate the superior performance of our method and applied it to a real exome sequencing dataset of epileptic encephalopathies, showing the capability of this approach in finding causative genes for complex diseases. We further provided the standalone software and online services of pgFusion at http://bioinfo.au.tsinghua.edu.cn/jianglab/pgfusion. Conclusions pgFusion not only provided an effective way for prioritizing candidate genes, but also demonstrated feasible solutions to two fundamental questions in the analysis of big genomic data: the comparability of heterogeneous data and the integration of multiple types of data. Applications of this method in exome or whole genome sequencing studies would accelerate the finding of causative genes for human diseases. Other research fields in genomics could also benefit from the incorporation of our data fusion methodology.
Collapse
|
411
|
Sharma A, Menche J, Huang CC, Ort T, Zhou X, Kitsak M, Sahni N, Thibault D, Voung L, Guo F, Ghiassian SD, Gulbahce N, Baribaud F, Tocker J, Dobrin R, Barnathan E, Liu H, Panettieri RA, Tantisira KG, Qiu W, Raby BA, Silverman EK, Vidal M, Weiss ST, Barabási AL. A disease module in the interactome explains disease heterogeneity, drug response and captures novel pathways and genes in asthma. Hum Mol Genet 2015; 24:3005-20. [PMID: 25586491 DOI: 10.1093/hmg/ddv001] [Citation(s) in RCA: 120] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2014] [Accepted: 01/05/2015] [Indexed: 01/24/2023] Open
Abstract
Recent advances in genetics have spurred rapid progress towards the systematic identification of genes involved in complex diseases. Still, the detailed understanding of the molecular and physiological mechanisms through which these genes affect disease phenotypes remains a major challenge. Here, we identify the asthma disease module, i.e. the local neighborhood of the interactome whose perturbation is associated with asthma, and validate it for functional and pathophysiological relevance, using both computational and experimental approaches. We find that the asthma disease module is enriched with modest GWAS P-values against the background of random variation, and with differentially expressed genes from normal and asthmatic fibroblast cells treated with an asthma-specific drug. The asthma module also contains immune response mechanisms that are shared with other immune-related disease modules. Further, using diverse omics (genomics, gene-expression, drug response) data, we identify the GAB1 signaling pathway as an important novel modulator in asthma. The wiring diagram of the uncovered asthma module suggests a relatively close link between GAB1 and glucocorticoids (GCs), which we experimentally validate, observing an increase in the level of GAB1 after GC treatment in BEAS-2B bronchial epithelial cells. The siRNA knockdown of GAB1 in the BEAS-2B cell line resulted in a decrease in the NFkB level, suggesting a novel regulatory path of the pro-inflammatory factor NFkB by GAB1 in asthma.
Collapse
Affiliation(s)
- Amitabh Sharma
- Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Jörg Menche
- Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA Department of Theoretical Physics, Budapest University of Technology and Economics, H1111, Budapest, Hungary Center for Network Science, Central European University, Nador u. 9, 1051 Budapest, Hungary
| | - C Chris Huang
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Tatiana Ort
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Xiaobo Zhou
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Maksim Kitsak
- Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Nidhi Sahni
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Derek Thibault
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Linh Voung
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Feng Guo
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Susan Dina Ghiassian
- Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Natali Gulbahce
- Department of Cellular and Molecular Pharmacology, University of California 1700, 4th Street, Byers Hall 308D, San Francisco, CA 94158, USA
| | - Frédéric Baribaud
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Joel Tocker
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Radu Dobrin
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Elliot Barnathan
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Hao Liu
- Janssen Research & Development, Inc., 1400 McKean Road, Spring House, PA 19477, USA
| | - Reynold A Panettieri
- Pulmonary Allergy and Critical Care Division, Department of Medicine, University of Pennsylvania, 125 South 31st Street, TRL Suite 1200, Philadelphia, PA 19104, USA
| | - Kelan G Tantisira
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Weiliang Qiu
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Benjamin A Raby
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Marc Vidal
- Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA Department of Genetics, Harvard Medical School, Boston, MA 02115, USA
| | - Scott T Weiss
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA
| | - Albert-László Barabási
- Center for Complex Networks Research, Department of Physics, Northeastern University, Boston, MA 02115, USA Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, MA 02215, USA Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA 02115, USA Department of Theoretical Physics, Budapest University of Technology and Economics, H1111, Budapest, Hungary Center for Network Science, Central European University, Nador u. 9, 1051 Budapest, Hungary
| |
Collapse
|
412
|
Chen H, Zhu Z, Zhu Y, Wang J, Mei Y, Cheng Y. Pathway mapping and development of disease-specific biomarkers: protein-based network biomarkers. J Cell Mol Med 2015; 19:297-314. [PMID: 25560835 PMCID: PMC4407592 DOI: 10.1111/jcmm.12447] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 08/22/2014] [Indexed: 01/06/2023] Open
Abstract
It is known that a disease is rarely a consequence of an abnormality of a single gene, but reflects the interactions of various processes in a complex network. Annotated molecular networks offer new opportunities to understand diseases within a systems biology framework and provide an excellent substrate for network-based identification of biomarkers. The network biomarkers and dynamic network biomarkers (DNBs) represent new types of biomarkers with protein-protein or gene-gene interactions that can be monitored and evaluated at different stages and time-points during development of disease. Clinical bioinformatics as a new way to combine clinical measurements and signs with human tissue-generated bioinformatics is crucial to translate biomarkers into clinical application, validate the disease specificity, and understand the role of biomarkers in clinical settings. In this article, the recent advances and developments on network biomarkers and DNBs are comprehensively reviewed. How network biomarkers help a better understanding of molecular mechanism of diseases, the advantages and constraints of network biomarkers for clinical application, clinical bioinformatics as a bridge to the development of diseases-specific, stage-specific, severity-specific and therapy predictive biomarkers, and the potentials of network biomarkers are also discussed.
Collapse
Affiliation(s)
- Hao Chen
- Department of Cardiothoracic Surgery, Tongji Hospital, Tongji University, Shanghai, China
| | | | | | | | | | | |
Collapse
|
413
|
ENGIN HBILLUR, HOFREE MATAN, CARTER HANNAH. Identifying mutation specific cancer pathways using a structurally resolved protein interaction network. PACIFIC SYMPOSIUM ON BIOCOMPUTING. PACIFIC SYMPOSIUM ON BIOCOMPUTING 2015; 20:84-95. [PMID: 25592571 PMCID: PMC4299875] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/04/2023]
Abstract
Here we present a method for extracting candidate cancer pathways from tumor 'omics data while explicitly accounting for diverse consequences of mutations for protein interactions. Disease-causing mutations are frequently observed at either core or interface residues mediating protein interactions. Mutations at core residues frequently destabilize protein structure while mutations at interface residues can specifically affect the binding energies of protein-protein interactions. As a result, mutations in a protein may result in distinct interaction profiles and thus have different phenotypic consequences. We describe a protein structure-guided pipeline for extracting interacting protein sets specific to a particular mutation. Of 59 cancer genes with 3D co-complexed structures in the Protein Data Bank, 43 showed evidence of mutations with different functional consequences. Literature survey reciprocated functional predictions specific to distinct mutations on APC, ATRX, BRCA1, CBL and HRAS. Our analysis suggests that accounting for mutation-specific perturbations to cancer pathways will be essential for personalized cancer therapy.
Collapse
Affiliation(s)
- H. BILLUR ENGIN
- School of Medicine, University of California San Diego, 9500 Gilman Dr. San Diego, CA 92093, USA
| | - MATAN HOFREE
- Department of Computer Science and Engineering, University of California San Diego, 9500 Gilman Dr. San Diego, CA 92093, USA
| | | |
Collapse
|
414
|
Taşan M, Musso G, Hao T, Vidal M, MacRae CA, Roth FP. Selecting causal genes from genome-wide association studies via functionally coherent subnetworks. Nat Methods 2014; 12:154-9. [PMID: 25532137 DOI: 10.1038/nmeth.3215] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2014] [Accepted: 11/24/2014] [Indexed: 12/27/2022]
Abstract
Genome-wide association (GWA) studies have linked thousands of loci to human diseases, but the causal genes and variants at these loci generally remain unknown. Although investigators typically focus on genes closest to the associated polymorphisms, the causal gene is often more distal. Reliance on published work to prioritize candidates is biased toward well-characterized genes. We describe a 'prix fixe' strategy and software that uses genome-scale shared-function networks to identify sets of mutually functionally related genes spanning multiple GWA loci. Using associations from ∼100 GWA studies covering ten cancer types, our approach outperformed the common alternative strategy in ranking known cancer genes. As more GWA loci are discovered, the strategy will have increased power to elucidate the causes of human disease.
Collapse
Affiliation(s)
- Murat Taşan
- 1] Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. [2] Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. [4] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [5] Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada
| | - Gabriel Musso
- 1] Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA. [2] Cardiovascular Division, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Tong Hao
- 1] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [2] Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Marc Vidal
- 1] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [2] Department of Genetics, Harvard Medical School, Boston, Massachusetts, USA
| | - Calum A MacRae
- 1] Department of Medicine, Harvard Medical School, Boston, Massachusetts, USA. [2] Cardiovascular Division, Brigham and Women's Hospital, Boston, Massachusetts, USA
| | - Frederick P Roth
- 1] Donnelly Centre, University of Toronto, Toronto, Ontario, Canada. [2] Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada. [3] Department of Computer Science, University of Toronto, Toronto, Ontario, Canada. [4] Center for Cancer Systems Biology (CCSB), Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA. [5] Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto, Ontario, Canada. [6] Canadian Institute for Advanced Research, Toronto, Ontario, Canada
| |
Collapse
|
415
|
Srihari S, Madhamshettiwar PB, Song S, Liu C, Simpson PT, Khanna KK, Ragan MA. Complex-based analysis of dysregulated cellular processes in cancer. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 4:S1. [PMID: 25521701 PMCID: PMC4290683 DOI: 10.1186/1752-0509-8-s4-s1] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Background Differential expression analysis of (individual) genes is often used to study their roles in diseases. However, diseases such as cancer are a result of the combined effect of multiple genes. Gene products such as proteins seldom act in isolation, but instead constitute stable multi-protein complexes performing dedicated functions. Therefore, complexes aggregate the effect of individual genes (proteins) and can be used to gain a better understanding of cancer mechanisms. Here, we observe that complexes show considerable changes in their expression, in turn directed by the concerted action of transcription factors (TFs), across cancer conditions. We seek to gain novel insights into cancer mechanisms through a systematic analysis of complexes and their transcriptional regulation. Results We integrated large-scale protein-interaction (PPI) and gene-expression datasets to identify complexes that exhibit significant changes in their expression across different conditions in cancer. We devised a log-linear model to relate these changes to the differential regulation of complexes by TFs. The application of our model on two case studies involving pancreatic and familial breast tumour conditions revealed: (i) complexes in core cellular processes, especially those responsible for maintaining genome stability and cell proliferation (e.g. DNA damage repair and cell cycle) show considerable changes in expression; (ii) these changes include decrease and countering increase for different sets of complexes indicative of compensatory mechanisms coming into play in tumours; and (iii) TFs work in cooperative and counteractive ways to regulate these mechanisms. Such aberrant complexes and their regulating TFs play vital roles in the initiation and progression of cancer. Conclusions Complexes in core cellular processes display considerable decreases and countering increases in expression, strongly reflective of compensatory mechanisms in cancer. These changes are directed by the concerted action of cooperative and counteractive TFs. Our study highlights the roles of these complexes and TFs and presents several case studies of compensatory processes, thus providing novel insights into cancer mechanisms.
Collapse
|
416
|
Mosca E, Alfieri R, Milanesi L. Diffusion of information throughout the host interactome reveals gene expression variations in network proximity to target proteins of hepatitis C virus. PLoS One 2014; 9:e113660. [PMID: 25461596 PMCID: PMC4251971 DOI: 10.1371/journal.pone.0113660] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2014] [Accepted: 10/27/2014] [Indexed: 12/22/2022] Open
Abstract
Hepatitis C virus infection is one of the most common and chronic in the world, and hepatitis associated with HCV infection is a major risk factor for the development of cirrhosis and hepatocellular carcinoma (HCC). The rapidly growing number of viral-host and host protein-protein interactions is enabling more and more reliable network-based analyses of viral infection supported by omics data. The study of molecular interaction networks helps to elucidate the mechanistic pathways linking HCV molecular activities and the host response that modulates the stepwise hepatocarcinogenic process from preneoplastic lesions (cirrhosis and dysplasia) to HCC. Simulating the impact of HCV-host molecular interactions throughout the host protein-protein interaction (PPI) network, we ranked the host proteins in relation to their network proximity to viral targets. We observed that the set of proteins in the neighborhood of HCV targets in the host interactome is enriched in key players of the host response to HCV infection. In opposition to HCV targets, subnetworks of proteins in network proximity to HCV targets are significantly enriched in proteins reported as differentially expressed in preneoplastic and neoplastic liver samples by two independent studies. Using multi-objective optimization, we extracted subnetworks that are simultaneously “guilt-by-association” with HCV proteins and enriched in proteins differentially expressed. These subnetworks contain established, recently proposed and novel candidate proteins for the regulation of the mechanisms of liver cells response to chronic HCV infection.
Collapse
Affiliation(s)
- Ettore Mosca
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, Italy
- * E-mail:
| | - Roberta Alfieri
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, Italy
| | - Luciano Milanesi
- Institute of Biomedical Technologies, National Research Council, Segrate, Milan, Italy
| |
Collapse
|
417
|
Qin G, Zhao XM. A survey on computational approaches to identifying disease biomarkers based on molecular networks. J Theor Biol 2014; 362:9-16. [DOI: 10.1016/j.jtbi.2014.06.007] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2014] [Revised: 06/03/2014] [Accepted: 06/04/2014] [Indexed: 11/29/2022]
|
418
|
Das J, Gayvert KM, Yu H. Predicting cancer prognosis using functional genomics data sets. Cancer Inform 2014; 13:85-8. [PMID: 25392695 PMCID: PMC4218897 DOI: 10.4137/cin.s14064] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2014] [Revised: 09/17/2014] [Accepted: 09/19/2014] [Indexed: 11/06/2022] Open
Abstract
Elucidating the molecular basis of human cancers is an extremely complex and challenging task. A wide variety of computational tools and experimental techniques have been used to address different aspects of this characterization. One major hurdle faced by both clinicians and researchers has been to pinpoint the mechanistic basis underlying a wide range of prognostic outcomes for the same type of cancer. Here, we provide an overview of various computational methods that have leveraged different functional genomics data sets to identify molecular signatures that can be used to predict prognostic outcome for various human cancers. Furthermore, we outline challenges that remain and future directions that may be explored to address them.
Collapse
Affiliation(s)
- Jishnu Das
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA. ; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| | - Kaitlyn M Gayvert
- Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY, USA. ; Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA
| |
Collapse
|
419
|
Emmert-Streib F, Tripathi S, Simoes RDM, Hawwa AF, Dehmer M. The human disease network. ACTA ACUST UNITED AC 2014. [DOI: 10.4161/sysb.22816] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
|
420
|
Ganegoda GU, Wang J, Wu FX, Li M. Prediction of disease genes using tissue-specified gene-gene network. BMC SYSTEMS BIOLOGY 2014; 8 Suppl 3:S3. [PMID: 25350876 PMCID: PMC4243117 DOI: 10.1186/1752-0509-8-s3-s3] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]
Abstract
BACKGROUND Tissue specificity is an important aspect of many genetic diseases in the context of genetic disorders as the disorder affects only few tissues. Therefore tissue specificity is important in identifying disease-gene associations. Hence this paper seeks to discuss the impact of using tissue specificity in predicting new disease-gene associations and how to use tissue specificity along with phenotype information for a particular disease. METHODS In order to find out the impact of using tissue specificity for predicting new disease-gene associations, this study proposes a novel method called tissue-specified genes to construct tissues-specific gene-gene networks for different tissue samples. Subsequently, these networks are used with phenotype details to predict disease genes by using Katz method. The proposed method was compared with three other tissue-specific network construction methods in order to check its effectiveness. Furthermore, to check the possibility of using tissue-specific gene-gene network instead of generic protein-protein network at all time, the results are compared with three other methods. RESULTS In terms of leave-one-out cross validation, calculation of the mean enrichment and ROC curves indicate that the proposed approach outperforms existing network construction methods. Furthermore tissues-specific gene-gene networks make a more positive impact on predicting disease-gene associations than generic protein-protein interaction networks. CONCLUSIONS In conclusion by integrating tissue-specific data it enabled prediction of known and unknown disease-gene associations for a particular disease more effectively. Hence it is better to use tissue-specific gene-gene network whenever possible. In addition the proposed method is a better way of constructing tissue-specific gene-gene networks.
Collapse
Affiliation(s)
| | - JianXin Wang
- School of Information Science and Engineering, Central South University, Changsha, China
| | - Fang-Xiang Wu
- School of Information Science and Engineering, Central South University, Changsha, China
- College of Engineering, University of Saskatchewan, 57 Campus Dr., Saskatoon, SK Canada
| | - Min Li
- School of Information Science and Engineering, Central South University, Changsha, China
| |
Collapse
|
421
|
Chen B, Wang J, Li M, Wu FX. Identifying disease genes by integrating multiple data sources. BMC Med Genomics 2014; 7 Suppl 2:S2. [PMID: 25350511 PMCID: PMC4243092 DOI: 10.1186/1755-8794-7-s2-s2] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022] Open
Abstract
BACKGROUND Now multiple types of data are available for identifying disease genes. Those data include gene-disease associations, disease phenotype similarities, protein-protein interactions, pathways, gene expression profiles, etc.. It is believed that integrating different kinds of biological data is an effective method to identify disease genes. RESULTS In this paper, we propose a multiple data integration method based on the theory of Markov random field (MRF) and the method of Bayesian analysis for identifying human disease genes. The proposed method is not only flexible in easily incorporating different kinds of data, but also reliable in predicting candidate disease genes. CONCLUSIONS Numerical experiments are carried out by integrating known gene-disease associations, protein complexes, protein-protein interactions, pathways and gene expression profiles. Predictions are evaluated by the leave-one-out method. The proposed method achieves an AUC score of 0.743 when integrating all those biological data in our experiments.
Collapse
|
422
|
Prioritization of orphan disease-causing genes using topological feature and GO similarity between proteins in interaction networks. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1064-71. [PMID: 25326068 DOI: 10.1007/s11427-014-4747-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/23/2014] [Accepted: 07/15/2014] [Indexed: 12/22/2022]
Abstract
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies. However, it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments. With the advances of the high-throughput techniques, a large number of protein-protein interactions have been produced. Therefore, to address this issue, several methods based on protein interaction network have been proposed. In this paper, we propose a shortest path-based algorithm, named SPranker, to prioritize disease-causing genes in protein interaction networks. Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes, we further propose an improved algorithm SPGOranker by integrating the semantic similarity of GO annotations. SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account. The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches, ICN, VS and RWR. The experimental results show that SPranker and SPGOranker outperform ICN, VS, and RWR for the prioritization of orphan disease-causing genes. Importantly, for the case study of severe combined immunodeficiency, SPranker and SPGOranker predict several novel causal genes.
Collapse
|
423
|
Disease gene identification by using graph kernels and Markov random fields. SCIENCE CHINA-LIFE SCIENCES 2014; 57:1054-63. [DOI: 10.1007/s11427-014-4745-8] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 07/14/2014] [Indexed: 01/05/2023]
|
424
|
Luo Y, Riedlinger G, Szolovits P. Text mining in cancer gene and pathway prioritization. Cancer Inform 2014; 13:69-79. [PMID: 25392685 PMCID: PMC4216063 DOI: 10.4137/cin.s13874] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2014] [Revised: 05/18/2014] [Accepted: 05/18/2014] [Indexed: 12/18/2022] Open
Abstract
Prioritization of cancer implicated genes has received growing attention as an effective way to reduce wet lab cost by computational analysis that ranks candidate genes according to the likelihood that experimental verifications will succeed. A multitude of gene prioritization tools have been developed, each integrating different data sources covering gene sequences, differential expressions, function annotations, gene regulations, protein domains, protein interactions, and pathways. This review places existing gene prioritization tools against the backdrop of an integrative Omic hierarchy view toward cancer and focuses on the analysis of their text mining components. We explain the relatively slow progress of text mining in gene prioritization, identify several challenges to current text mining methods, and highlight a few directions where more effective text mining algorithms may improve the overall prioritization task and where prioritizing the pathways may be more desirable than prioritizing only genes.
Collapse
Affiliation(s)
- Yuan Luo
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| | - Gregory Riedlinger
- Department of Pathology, Massachusetts General Hospital, Boston, MA, USA
| | - Peter Szolovits
- Computer Science and Artificial Intelligence Lab, Massachusetts Institute of Technology, Cambridge, MA, USA
| |
Collapse
|
425
|
Chen Y, Xu R. Mining cancer-specific disease comorbidities from a large observational health database. Cancer Inform 2014; 13:37-44. [PMID: 25392682 PMCID: PMC4216041 DOI: 10.4137/cin.s13893] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2014] [Revised: 04/29/2014] [Accepted: 04/30/2014] [Indexed: 12/28/2022] Open
Abstract
Cancer comorbidities often reflect the complex pathogenesis of cancers and provide valuable clues to discover the underlying genetic mechanisms of cancers. In this study, we systematically mine and analyze cancer-specific comorbidity from the FDA Adverse Event Reporting System. We stratified 3,354,043 patients based on age and gender, and developed a network-based approach to extract comorbidity patterns from each patient group. We compared the comorbidity patterns among different patient groups and investigated the effect of age and gender on cancer comorbidity patterns. The results demonstrated that the comorbidity relationships between cancers and non-cancer diseases largely depend on age and gender. A few exceptions are depression, anxiety, and metabolic syndrome, whose comorbidity relationships with cancers are relatively stable among all patients. Literature evidences demonstrate that these stable cancer comorbidities reflect the pathogenesis of cancers. We applied our comorbidity mining approach on colorectal cancer and detected its comorbid associations with metabolic syndrome components, diabetes, and osteoporosis. Our results not only confirmed known cancer comorbidities but also generated novel hypotheses, which can illuminate the common pathophysiology between cancers and their co-occurring diseases.
Collapse
Affiliation(s)
- Yang Chen
- Division of Medical Informatics, Case Western Reserve University, Cleveland, OH, USA
| | - Rong Xu
- Division of Medical Informatics, Case Western Reserve University, Cleveland, OH, USA
| |
Collapse
|
426
|
Cao M, Pietras CM, Feng X, Doroschak KJ, Schaffner T, Park J, Zhang H, Cowen LJ, Hescott BJ. New directions for diffusion-based network prediction of protein function: incorporating pathways with confidence. ACTA ACUST UNITED AC 2014; 30:i219-27. [PMID: 24931987 PMCID: PMC4058952 DOI: 10.1093/bioinformatics/btu263] [Citation(s) in RCA: 95] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]
Abstract
Motivation: It has long been hypothesized that incorporating models of network noise as well as edge directions and known pathway information into the representation of protein–protein interaction (PPI) networks might improve their utility for functional inference. However, a simple way to do this has not been obvious. We find that diffusion state distance (DSD), our recent diffusion-based metric for measuring dissimilarity in PPI networks, has natural extensions that incorporate confidence, directions and can even express coherent pathways by calculating DSD on an augmented graph. Results: We define three incremental versions of DSD which we term cDSD, caDSD and capDSD, where the capDSD matrix incorporates confidence, known directed edges, and pathways into the measure of how similar each pair of nodes is according to the structure of the PPI network. We test four popular function prediction methods (majority vote, weighted majority vote, multi-way cut and functional flow) using these different matrices on the Baker’s yeast PPI network in cross-validation. The best performing method is weighted majority vote using capDSD. We then test the performance of our augmented DSD methods on an integrated heterogeneous set of protein association edges from the STRING database. The superior performance of capDSD in this context confirms that treating the pathways as probabilistic units is more powerful than simply incorporating pathway edges independently into the network. Availability: All source code for calculating the confidences, for extracting pathway information from KEGG XML files, and for calculating the cDSD, caDSD and capDSD matrices are available from http://dsd.cs.tufts.edu/capdsd Contact:lenore.cowen@tufts.edu or benjamin.hescott@tufts.edu Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mengfei Cao
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Christopher M Pietras
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Xian Feng
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Kathryn J Doroschak
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Thomas Schaffner
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Jisoo Park
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Hao Zhang
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Lenore J Cowen
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| | - Benjamin J Hescott
- Department of Computer Science, Tufts University, Medford, MA 02155, USA and Department of Computer Science, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
427
|
Abstract
MOTIVATION Most existing methods for predicting causal disease genes rely on specific type of evidence, and are therefore limited in terms of applicability. More often than not, the type of evidence available for diseases varies-for example, we may know linked genes, keywords associated with the disease obtained by mining text, or co-occurrence of disease symptoms in patients. Similarly, the type of evidence available for genes varies-for example, specific microarray probes convey information only for certain sets of genes. In this article, we apply a novel matrix-completion method called Inductive Matrix Completion to the problem of predicting gene-disease associations; it combines multiple types of evidence (features) for diseases and genes to learn latent factors that explain the observed gene-disease associations. We construct features from different biological sources such as microarray expression data and disease-related textual data. A crucial advantage of the method is that it is inductive; it can be applied to diseases not seen at training time, unlike traditional matrix-completion approaches and network-based inference methods that are transductive. RESULTS Comparison with state-of-the-art methods on diseases from the Online Mendelian Inheritance in Man (OMIM) database shows that the proposed approach is substantially better-it has close to one-in-four chance of recovering a true association in the top 100 predictions, compared to the recently proposed Catapult method (second best) that has <15% chance. We demonstrate that the inductive method is particularly effective for a query disease with no previously known gene associations, and for predicting novel genes, i.e. genes that are previously not linked to diseases. Thus the method is capable of predicting novel genes even for well-characterized diseases. We also validate the novelty of predictions by evaluating the method on recently reported OMIM associations and on associations recently reported in the literature. AVAILABILITY Source code and datasets can be downloaded from http://bigdata.ices.utexas.edu/project/gene-disease.
Collapse
Affiliation(s)
- Nagarajan Natarajan
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| | - Inderjit S Dhillon
- Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA
| |
Collapse
|
428
|
Chen Y, Zhang X, Zhang GQ, Xu R. Comparative analysis of a novel disease phenotype network based on clinical manifestations. J Biomed Inform 2014; 53:113-20. [PMID: 25277758 DOI: 10.1016/j.jbi.2014.09.007] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2014] [Revised: 08/18/2014] [Accepted: 09/21/2014] [Indexed: 12/21/2022]
Abstract
Systems approaches to analyzing disease phenotype networks in combination with protein functional interaction networks have great potential in illuminating disease pathophysiological mechanisms. While many genetic networks are readily available, disease phenotype networks remain largely incomplete. In this study, we built a large-scale Disease Manifestation Network (DMN) from 50,543 highly accurate disease-manifestation semantic relationships in the United Medical Language System (UMLS). Our new phenotype network contains 2305 nodes and 373,527 weighted edges to represent the disease phenotypic similarities. We first compared DMN with the networks representing genetic relationships among diseases, and demonstrated that the phenotype clustering in DMN reflects common disease genetics. Then we compared DMN with a widely-used disease phenotype network in previous gene discovery studies, called mimMiner, which was extracted from the textual descriptions in Online Mendelian Inheritance in Man (OMIM). We demonstrated that DMN contains different knowledge from the existing phenotype data source. Finally, a case study on Marfan syndrome further proved that DMN contains useful information and can provide leads to discover unknown disease causes. Integrating DMN in systems approaches with mimMiner and other data offers the opportunities to predict novel disease genetics. We made DMN publicly available at nlp/case.edu/public/data/DMN.
Collapse
Affiliation(s)
- Yang Chen
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States; Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Xiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Guo-Qiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, United States; Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States
| | - Rong Xu
- Division of Medical Informatics, School of Medicine, Case Western Reserve University, Cleveland, OH 44106, United States.
| |
Collapse
|
429
|
Jiang L, Edwards SM, Thomsen B, Workman CT, Guldbrandtsen B, Sørensen P. A random set scoring model for prioritization of disease candidate genes using protein complexes and data-mining of GeneRIF, OMIM and PubMed records. BMC Bioinformatics 2014; 15:315. [PMID: 25253562 PMCID: PMC4181406 DOI: 10.1186/1471-2105-15-315] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2013] [Accepted: 09/17/2014] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Prioritizing genetic variants is a challenge because disease susceptibility loci are often located in genes of unknown function or the relationship with the corresponding phenotype is unclear. A global data-mining exercise on the biomedical literature can establish the phenotypic profile of genes with respect to their connection to disease phenotypes. The importance of protein-protein interaction networks in the genetic heterogeneity of common diseases or complex traits is becoming increasingly recognized. Thus, the development of a network-based approach combined with phenotypic profiling would be useful for disease gene prioritization. RESULTS We developed a random-set scoring model and implemented it to quantify phenotype relevance in a network-based disease gene-prioritization approach. We validated our approach based on different gene phenotypic profiles, which were generated from PubMed abstracts, OMIM, and GeneRIF records. We also investigated the validity of several vocabulary filters and different likelihood thresholds for predicted protein-protein interactions in terms of their effect on the network-based gene-prioritization approach, which relies on text-mining of the phenotype data. Our method demonstrated good precision and sensitivity compared with those of two alternative complex-based prioritization approaches. We then conducted a global ranking of all human genes according to their relevance to a range of human diseases. The resulting accurate ranking of known causal genes supported the reliability of our approach. Moreover, these data suggest many promising novel candidate genes for human disorders that have a complex mode of inheritance. CONCLUSION We have implemented and validated a network-based approach to prioritize genes for human diseases based on their phenotypic profile. We have devised a powerful and transparent tool to identify and rank candidate genes. Our global gene prioritization provides a unique resource for the biological interpretation of data from genome-wide association studies, and will help in the understanding of how the associated genetic variants influence disease or quantitative phenotypes.
Collapse
Affiliation(s)
- Li Jiang
- Department of Molecular Biology and Genetics, Aarhus University, DK-8830 Tjele, Denmark.
| | | | | | | | | | | |
Collapse
|
430
|
Wu M, Kwoh CK, Li X, Zheng J. Finding trans-regulatory genes and protein complexes modulating meiotic recombination hotspots of human, mouse and yeast. BMC SYSTEMS BIOLOGY 2014; 8:107. [PMID: 25208583 PMCID: PMC4236725 DOI: 10.1186/s12918-014-0107-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 07/11/2014] [Indexed: 11/18/2022]
Abstract
Background The regulatory mechanism of recombination is one of the most fundamental problems in genomics, with wide applications in genome wide association studies (GWAS), birth-defect diseases, molecular evolution, cancer research, etc. Recombination events cluster into short genomic regions called “recombination hotspots”. Recently, a zinc finger protein PRDM9 was reported to regulate recombination hotspots in human and mouse genomes. In addition, a 13-mer motif contained in the binding sites of PRDM9 is found to be enriched in human hotspots. However, this 13-mer motif only covers a fraction of hotspots, indicating that PRDM9 is not the only regulator of recombination hotspots. Therefore, the challenge of discovering other regulators of recombination hotspots becomes significant. Furthermore, recombination is a complex process. Hence, multiple proteins acting as machinery, rather than individual proteins, are more likely to carry out this process in a precise and stable manner. Therefore, the extension of the prediction of individual trans-regulators to protein complexes is also highly desired. Results In this paper, we introduce a pipeline to identify genes and protein complexes associated with recombination hotspots. First, we prioritize proteins associated with hotspots based on their preference of binding to hotspots and coldspots. Second, using the above identified genes as seeds, we apply the Random Walk with Restart algorithm (RWR) to propagate their influences to other proteins in protein-protein interaction (PPI) networks. Hence, many proteins without DNA-binding information will also be assigned a score to implicate their roles in recombination hotspots. Third, we construct sub-PPI networks induced by top genes ranked by RWR for various species (e.g., yeast, human and mouse) and detect protein complexes in those sub-PPI networks. Conclusions The GO term analysis show that our prioritizing methods and the RWR algorithm are capable of identifying novel genes associated with recombination hotspots. The trans-regulators predicted by our pipeline are enriched with epigenetic functions (e.g., histone modifications), demonstrating the epigenetic regulatory mechanisms of recombination hotspots. The identified protein complexes also provide us with candidates to further investigate the molecular machineries for recombination hotspots. Moreover, the experimental data and results are available on our web site http://www.ntu.edu.sg/home/zhengjie/data/RecombinationHotspot/NetPipe/.
Collapse
|
431
|
Li ZC, Lai YH, Chen LL, Xie Y, Dai Z, Zou XY. Identifying and prioritizing disease-related genes based on the network topological features. BIOCHIMICA ET BIOPHYSICA ACTA-PROTEINS AND PROTEOMICS 2014; 1844:2214-21. [PMID: 25183318 DOI: 10.1016/j.bbapap.2014.08.009] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Revised: 07/22/2014] [Accepted: 08/14/2014] [Indexed: 11/26/2022]
Abstract
Identifying and prioritizing disease-related genes are the most important steps for understanding the pathogenesis and discovering the therapeutic targets. The experimental examination of these genes is very expensive and laborious, and usually has a higher false positive rate. Therefore, it is highly desirable to develop computational methods for the identification and prioritization of disease-related genes. In this study, we develop a powerful method to identify and prioritize candidate disease genes. The novel network topological features with local and global information are proposed and adopted to characterize genes. The performance of these novel features is verified based on the 10-fold cross-validation test and leave-one-out cross-validation test. The proposed features are compared with the published features, and fused strategy is investigated by combining the current features with the published features. And, these combination features are also utilized to identify and prioritize Parkinson's disease-related genes. The results indicate that identified genes are highly related to some molecular process and biological function, which provides new clues for researching pathogenesis of Parkinson's disease. The source code of Matlab is freely available on request from the authors.
Collapse
Affiliation(s)
- Zhan-Chao Li
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China.
| | - Yan-Hua Lai
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Li-Li Chen
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Yun Xie
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Guangzhou 510006, People's Republic of China
| | - Zong Dai
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China
| | - Xiao-Yong Zou
- School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, People's Republic of China.
| |
Collapse
|
432
|
Liu Z, Gao Y, Hao F, Lou X, Zhang X, Li Y, Wu D, Xiao T, Yang L, Li Q, Qiu X, Wang E. Secretomes are a potential source of molecular targets for cancer therapies and indicate that APOE is a candidate biomarker for lung adenocarcinoma metastasis. Mol Biol Rep 2014; 41:7507-23. [PMID: 25098600 DOI: 10.1007/s11033-014-3641-4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2013] [Accepted: 07/23/2014] [Indexed: 12/20/2022]
Abstract
Identifying patients at high risk of metastasis is a major challenge in lung adenocarcinoma (ADC) therapy, therefore discovery of noninvasive biomarkers and therapeutic targets is urgent. We found significant differences between the secretomes of differentially expressed proteins in lung ADC cell lines, clinical tissue samples and serum plasma samples with high and low metastatic potential. In particular, Apolipoprotein E (APOE) levels were three-times greater in cells with lymph node metastases (LNM) than those without. Our study indicates that APOE is a potential indicator of metastatic lung ADC and that secretomes may offer a valuable resource for biomarkers of lung ADC with LNM.
Collapse
Affiliation(s)
- Zan Liu
- Department of Pathology, The First Affiliated Hospital and College of Basic Medical Sciences of China Medical University, Shenyang, 110001, China
| | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
433
|
Wang W, Yang S, Zhang X, Li J. Drug repositioning by integrating target information through a heterogeneous network model. ACTA ACUST UNITED AC 2014; 30:2923-30. [PMID: 24974205 DOI: 10.1093/bioinformatics/btu403] [Citation(s) in RCA: 196] [Impact Index Per Article: 19.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
MOTIVATION The emergence of network medicine not only offers more opportunities for better and more complete understanding of the molecular complexities of diseases, but also serves as a promising tool for identifying new drug targets and establishing new relationships among diseases that enable drug repositioning. Computational approaches for drug repositioning by integrating information from multiple sources and multiple levels have the potential to provide great insights to the complex relationships among drugs, targets, disease genes and diseases at a system level. RESULTS In this article, we have proposed a computational framework based on a heterogeneous network model and applied the approach on drug repositioning by using existing omics data about diseases, drugs and drug targets. The novelty of the framework lies in the fact that the strength between a disease-drug pair is calculated through an iterative algorithm on the heterogeneous graph that also incorporates drug-target information. Comprehensive experimental results show that the proposed approach significantly outperforms several recent approaches. Case studies further illustrate its practical usefulness. AVAILABILITY AND IMPLEMENTATION http://cbc.case.edu CONTACT jingli@cwru.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Wenhui Wang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA and Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA and Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Sen Yang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA and Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Xiang Zhang
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA and Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| | - Jing Li
- Department of Electrical Engineering and Computer Science, Case Western Reserve University, Cleveland, OH 44106, USA and Molecular and Computational Biology, University of Southern California, Los Angeles, CA 90089, USA
| |
Collapse
|
434
|
Human symptoms-disease network. Nat Commun 2014; 5:4212. [PMID: 24967666 DOI: 10.1038/ncomms5212] [Citation(s) in RCA: 316] [Impact Index Per Article: 31.6] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/07/2013] [Accepted: 05/27/2014] [Indexed: 12/19/2022] Open
Abstract
In the post-genomic era, the elucidation of the relationship between the molecular origins of diseases and their resulting phenotypes is a crucial task for medical research. Here, we use a large-scale biomedical literature database to construct a symptom-based human disease network and investigate the connection between clinical manifestations of diseases and their underlying molecular interactions. We find that the symptom-based similarity of two diseases correlates strongly with the number of shared genetic associations and the extent to which their associated proteins interact. Moreover, the diversity of the clinical manifestations of a disease can be related to the connectivity patterns of the underlying protein interaction network. The comprehensive, high-quality map of disease-symptom relations can further be used as a resource helping to address important questions in the field of systems medicine, for example, the identification of unexpected associations between diseases, disease etiology research or drug design.
Collapse
|
435
|
Koyejo O, Lee C, Ghosh J. A constrained matrix-variate Gaussian process for transposable data. Mach Learn 2014. [DOI: 10.1007/s10994-014-5444-1] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
|
436
|
Li X, Zhou X, Peng Y, Liu B, Zhang R, Hu J, Yu J, Jia C, Sun C. Network based integrated analysis of phenotype-genotype data for prioritization of candidate symptom genes. BIOMED RESEARCH INTERNATIONAL 2014; 2014:435853. [PMID: 24991551 PMCID: PMC4060751 DOI: 10.1155/2014/435853] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 04/30/2014] [Indexed: 11/17/2022]
Abstract
BACKGROUND Symptoms and signs (symptoms in brief) are the essential clinical manifestations for individualized diagnosis and treatment in traditional Chinese medicine (TCM). To gain insights into the molecular mechanism of symptoms, we develop a computational approach to identify the candidate genes of symptoms. METHODS This paper presents a network-based approach for the integrated analysis of multiple phenotype-genotype data sources and the prediction of the prioritizing genes for the associated symptoms. The method first calculates the similarities between symptoms and diseases based on the symptom-disease relationships retrieved from the PubMed bibliographic database. Then the disease-gene associations and protein-protein interactions are utilized to construct a phenotype-genotype network. The PRINCE algorithm is finally used to rank the potential genes for the associated symptoms. RESULTS The proposed method gets reliable gene rank list with AUC (area under curve) 0.616 in classification. Some novel genes like CALCA, ESR1, and MTHFR were predicted to be associated with headache symptoms, which are not recorded in the benchmark data set, but have been reported in recent published literatures. CONCLUSIONS Our study demonstrated that by integrating phenotype-genotype relationships into a complex network framework it provides an effective approach to identify candidate genes of symptoms.
Collapse
Affiliation(s)
- Xing Li
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Xuezhong Zhou
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Yonghong Peng
- School of Engineering and Informatics, University of Bradford, West Yorkshire BD7 1DP, UK
| | - Baoyan Liu
- China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Runshun Zhang
- Guang'anmen Hospital, China Academy of Chinese Medical Sciences, Beijing 100053, China
| | - Jingqing Hu
- Institute of Basic Theory of Traditional Chinese Medicine, China Academy of Chinese Medical Sciences, Beijing 100700, China
| | - Jian Yu
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Caiyan Jia
- School of Computer and Information Technology and Beijing Key Lab of Traffic Data Analysis and Mining, Beijing Jiaotong University, Beijing 100044, China
| | - Changkai Sun
- Liaoning Provincial Key Laboratory of Cerebral Diseases, Institute for Brain Disorders, Dalian Medical University, Dalian 116044, China
| |
Collapse
|
437
|
Yang P, Li X, Chua HN, Kwoh CK, Ng SK. Ensemble positive unlabeled learning for disease gene identification. PLoS One 2014; 9:e97079. [PMID: 24816822 PMCID: PMC4016241 DOI: 10.1371/journal.pone.0097079] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2013] [Accepted: 04/14/2014] [Indexed: 11/24/2022] Open
Abstract
An increasing number of genes have been experimentally confirmed in recent years as causative genes to various human diseases. The newly available knowledge can be exploited by machine learning methods to discover additional unknown genes that are likely to be associated with diseases. In particular, positive unlabeled learning (PU learning) methods, which require only a positive training set P (confirmed disease genes) and an unlabeled set U (the unknown candidate genes) instead of a negative training set N, have been shown to be effective in uncovering new disease genes in the current scenario. Using only a single source of data for prediction can be susceptible to bias due to incompleteness and noise in the genomic data and a single machine learning predictor prone to bias caused by inherent limitations of individual methods. In this paper, we propose an effective PU learning framework that integrates multiple biological data sources and an ensemble of powerful machine learning classifiers for disease gene identification. Our proposed method integrates data from multiple biological sources for training PU learning classifiers. A novel ensemble-based PU learning method EPU is then used to integrate multiple PU learning classifiers to achieve accurate and robust disease gene predictions. Our evaluation experiments across six disease groups showed that EPU achieved significantly better results compared with various state-of-the-art prediction methods as well as ensemble learning classifiers. Through integrating multiple biological data sources for training and the outputs of an ensemble of PU learning classifiers for prediction, we are able to minimize the potential bias and errors in individual data sources and machine learning algorithms to achieve more accurate and robust disease gene predictions. In the future, our EPU method provides an effective framework to integrate the additional biological and computational resources for better disease gene predictions.
Collapse
Affiliation(s)
- Peng Yang
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- * E-mail: (PY); (XL)
| | - Xiaoli Li
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
- * E-mail: (PY); (XL)
| | - Hon-Nian Chua
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| | - Chee-Keong Kwoh
- Bioinformatics Research Centre, School of Computer Engineering, Nanyang Technological University, Singapore, Singapore
| | - See-Kiong Ng
- Data Analytics Department, Institute for Infocomm Research (I2R), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore
| |
Collapse
|
438
|
Chasman D, Gancarz B, Hao L, Ferris M, Ahlquist P, Craven M. Inferring host gene subnetworks involved in viral replication. PLoS Comput Biol 2014; 10:e1003626. [PMID: 24874113 PMCID: PMC4038467 DOI: 10.1371/journal.pcbi.1003626] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2013] [Accepted: 02/06/2014] [Indexed: 12/16/2022] Open
Abstract
Systematic, genome-wide loss-of-function experiments can be used to identify host factors that directly or indirectly facilitate or inhibit the replication of a virus in a host cell. We present an approach that combines an integer linear program and a diffusion kernel method to infer the pathways through which those host factors modulate viral replication. The inputs to the method are a set of viral phenotypes observed in single-host-gene mutants and a background network consisting of a variety of host intracellular interactions. The output is an ensemble of subnetworks that provides a consistent explanation for the measured phenotypes, predicts which unassayed host factors modulate the virus, and predicts which host factors are the most direct interfaces with the virus. We infer host-virus interaction subnetworks using data from experiments screening the yeast genome for genes modulating the replication of two RNA viruses. Because a gold-standard network is unavailable, we assess the predicted subnetworks using both computational and qualitative analyses. We conduct a cross-validation experiment in which we predict whether held-aside test genes have an effect on viral replication. Our approach is able to make high-confidence predictions more accurately than several baselines, and about as well as the best baseline, which does not infer mechanistic pathways. We also examine two kinds of predictions made by our method: which host factors are nearest to a direct interaction with a viral component, and which unassayed host genes are likely to be involved in viral replication. Multiple predictions are supported by recent independent experimental data, or are components or functional partners of confirmed relevant complexes or pathways. Integer program code, background network data, and inferred host-virus subnetworks are available at http://www.biostat.wisc.edu/~craven/chasman_host_virus/.
Collapse
Affiliation(s)
- Deborah Chasman
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Brandi Gancarz
- Luminex Corporation, Madison, Wisconsin, United States of America
- Institute for Molecular Virology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Linhui Hao
- Institute for Molecular Virology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Howard Hughes Medical Institute, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Michael Ferris
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Paul Ahlquist
- Institute for Molecular Virology, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Howard Hughes Medical Institute, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Morgridge Institute for Research, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| | - Mark Craven
- Department of Computer Sciences, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
- Department of Biostatistics and Medical Informatics, University of Wisconsin–Madison, Madison, Wisconsin, United States of America
| |
Collapse
|
439
|
Ma X, Gao L, Tan K. Modeling disease progression using dynamics of pathway connectivity. Bioinformatics 2014; 30:2343-50. [PMID: 24771518 DOI: 10.1093/bioinformatics/btu298] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Disease progression is driven by dynamic changes in both the activity and connectivity of molecular pathways. Understanding these dynamic events is critical for disease prognosis and effective treatment. Compared with activity dynamics, connectivity dynamics is poorly explored. RESULTS We describe the M-module algorithm to identify gene modules with common members but varied connectivity across multiple gene co-expression networks (aka M-modules). We introduce a novel metric to capture the connectivity dynamics of an entire M-module. We find that M-modules with dynamic connectivity have distinct topological and biochemical properties compared with static M-modules and hub genes. We demonstrate that incorporation of module connectivity dynamics significantly improves disease stage prediction. We identify different sets of M-modules that are important for specific disease stage transitions and offer new insights into the molecular events underlying disease progression. Besides modeling disease progression, the algorithm and metric introduced here are broadly applicable to modeling dynamics of molecular pathways. AVAILABILITY AND IMPLEMENTATION M-module is implemented in R. The source code is freely available at http://www.healthcare.uiowa.edu/labs/tan/M-module.zip.
Collapse
Affiliation(s)
- Xiaoke Ma
- Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Long Gao
- Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| | - Kai Tan
- Department of Internal Medicine and Department of Biomedical Engineering, University of Iowa, Iowa City, IA 52242, USA
| |
Collapse
|
440
|
Joshi S, Singh AR, Zulcic M, Bao L, Messer K, Ideker T, Dutkowski J, Durden DL. Rac2 controls tumor growth, metastasis and M1-M2 macrophage differentiation in vivo. PLoS One 2014; 9:e95893. [PMID: 24770346 PMCID: PMC4000195 DOI: 10.1371/journal.pone.0095893] [Citation(s) in RCA: 92] [Impact Index Per Article: 9.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2013] [Accepted: 03/31/2014] [Indexed: 12/16/2022] Open
Abstract
Although it is well-established that the macrophage M1 to M2 transition plays a role in tumor progression, the molecular basis for this process remains incompletely understood. Herein, we demonstrate that the small GTPase, Rac2 controls macrophage M1 to M2 differentiation and the metastatic phenotype in vivo. Using a genetic approach, combined with syngeneic and orthotopic tumor models we demonstrate that Rac2-/- mice display a marked defect in tumor growth, angiogenesis and metastasis. Microarray, RT-PCR and metabolomic analysis on bone marrow derived macrophages isolated from the Rac2-/- mice identify an important role for Rac2 in M2 macrophage differentiation. Furthermore, we define a novel molecular mechanism by which signals transmitted from the extracellular matrix via the α4β1 integrin and MCSF receptor lead to the activation of Rac2 and potentially regulate macrophage M2 differentiation. Collectively, our findings demonstrate a macrophage autonomous process by which the Rac2 GTPase is activated downstream of the α4β1 integrin and the MCSF receptor to control tumor growth, metastasis and macrophage differentiation into the M2 phenotype. Finally, using gene expression and metabolomic data from our Rac2-/- model, and information related to M1-M2 macrophage differentiation curated from the literature we executed a systems biologic analysis of hierarchical protein-protein interaction networks in an effort to develop an iterative interactome map which will predict additional mechanisms by which Rac2 may coordinately control macrophage M1 to M2 differentiation and metastasis.
Collapse
Affiliation(s)
- Shweta Joshi
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Alok R. Singh
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Muamera Zulcic
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Lei Bao
- UCSD Department of Biostatistics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Karen Messer
- UCSD Department of Biostatistics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla, California, United States of America
| | - Donald L. Durden
- UCSD Department of Pediatrics, Moores Cancer Center, University of California San Diego, La Jolla, California, United States of America
- Department of Pediatrics and Rady Children's Hospital, San Diego, La Jolla, California, United States of America
| |
Collapse
|
441
|
A three step network based approach (TSNBA) to finding disease molecular signature and key regulators: a case study of IL-1 and TNF-alpha stimulated inflammation. PLoS One 2014; 9:e94360. [PMID: 24747419 PMCID: PMC3991618 DOI: 10.1371/journal.pone.0094360] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2014] [Accepted: 03/13/2014] [Indexed: 12/11/2022] Open
Abstract
A disease molecular signature is a set of biomolecular features that are prognostic of clinical phenotypes and indicative of underlying pathology. It is of great importance to develop computational approaches for finding more relevant molecular signatures. Based upon the hypothesis that various components in a molecular signature are more likely to share similar patterns, we introduced a novel three step network based approach (TSNBA) to identify the molecular signature and key pathological regulators. Protein-protein interaction (PPI) network and ranking algorithm were integrated in the first step to find pathology related proteins with high accuracy. It was followed by the second step to further screen with co-expression patterns for better pathology enrichment. Context likelihood of relatedness (CLR) algorithm was used in the third step to infer gene regulatory networks and identify key transcription regulators. We applied this approach to study IL-1 (interleukin-1) and TNF-alpha (tumor necrosis factor-alpha) stimulated inflammation. TSNBA identified inflammatory signature with high accuracy and outperformed 5 competing methods namely fold change, degree, interconnectivity, neighborhood score and network propagation based approaches. The best molecular signature, with 80% (40/50) confirmed inflammatory genes, was used to predict inflammation related genes. As a result, 8 out of 10 predicted inflammation genes that were not included in the benchmark Entrez Gene database were validated by literature evidence. Furthermore, 23 of the 32 predicted inflammation regulators were validated by literature evidence. The rest 9 were also validated with TF (transcription factor) binding site analysis. In conclusion, we developed an efficient strategy for disease molecular signature finding and key pathological regulator identification.
Collapse
|
442
|
Zhu C, Wu C, Aronow BJ, Jegga AG. Computational approaches for human disease gene prediction and ranking. ADVANCES IN EXPERIMENTAL MEDICINE AND BIOLOGY 2014; 799:69-84. [PMID: 24292962 DOI: 10.1007/978-1-4614-8778-4_4] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]
Abstract
While candidate gene association studies continue to be the most practical and frequently employed approach in disease gene investigation for complex disorders, selecting suitable genes to test is a challenge. There are several computational approaches available for selecting and prioritizing disease candidate genes. A majority of these tools are based on guilt-by-association principle where novel disease candidate genes are identified and prioritized based on either functional or topological similarity to known disease genes. In this chapter we review the prioritization criteria and the algorithms along with some use cases that demonstrate how these tools can be used for identifying and ranking human disease candidate genes.
Collapse
Affiliation(s)
- Cheng Zhu
- Department of Computer Science, College of Engineering and Applied Science, University of Cincinnati, Cincinnati, OH, USA
| | | | | | | |
Collapse
|
443
|
Grennan KS, Chen C, Gershon ES, Liu C. Molecular network analysis enhances understanding of the biology of mental disorders. Bioessays 2014; 36:606-16. [PMID: 24733456 DOI: 10.1002/bies.201300147] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]
Abstract
We provide an introduction to network theory, evidence to support a connection between molecular network structure and neuropsychiatric disease, and examples of how network approaches can expand our knowledge of the molecular bases of these diseases. Without systematic methods to derive their biological meanings and inter-relatedness, the many molecular changes associated with neuropsychiatric disease, including genetic variants, gene expression changes, and protein differences, present an impenetrably complex set of findings. Network approaches can potentially help integrate and reconcile these findings, as well as provide new insights into the molecular architecture of neuropsychiatric diseases. Network approaches to neuropsychiatric disease are still in their infancy, and we discuss what might be done to improve their prospects.
Collapse
Affiliation(s)
- Kay S Grennan
- Department of Psychiatry, University of Illinois at Chicago, Chicago, IL, USA
| | | | | | | |
Collapse
|
444
|
Xu R, Li L, Wang Q. dRiskKB: a large-scale disease-disease risk relationship knowledge base constructed from biomedical text. BMC Bioinformatics 2014; 15:105. [PMID: 24725842 PMCID: PMC3998061 DOI: 10.1186/1471-2105-15-105] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2013] [Accepted: 04/07/2014] [Indexed: 11/14/2022] Open
Abstract
BACKGROUND Discerning the genetic contributions to complex human diseases is a challenging mandate that demands new types of data and calls for new avenues for advancing the state-of-the-art in computational approaches to uncovering disease etiology. Systems approaches to studying observable phenotypic relationships among diseases are emerging as an active area of research for both novel disease gene discovery and drug repositioning. Currently, systematic study of disease relationships on a phenome-wide scale is limited due to the lack of large-scale machine understandable disease phenotype relationship knowledge bases. Our study innovates a semi-supervised iterative pattern learning approach that is used to build an precise, large-scale disease-disease risk relationship (D1 → D2) knowledge base (dRiskKB) from a vast corpus of free-text published biomedical literature. RESULTS 21,354,075 MEDLINE records comprised the text corpus under study. First, we used one typical disease risk-specific syntactic pattern (i.e. "D1 due to D2") as a seed to automatically discover other patterns specifying similar semantic relationships among diseases. We then extracted D1 → D2 risk pairs from MEDLINE using the learned patterns. We manually evaluated the precisions of the learned patterns and extracted pairs. Finally, we analyzed the correlations between disease-disease risk pairs and their associated genes and drugs. The newly created dRiskKB consists of a total of 34,448 unique D1 → D2 pairs, representing the risk-specific semantic relationships among 12,981 diseases with each disease linked to its associated genes and drugs. The identified patterns are highly precise (average precision of 0.99) in specifying the risk-specific relationships among diseases. The precisions of extracted pairs are 0.919 for those that are exactly matched and 0.988 for those that are partially matched. By comparing the iterative pattern approach starting from different seeds, we demonstrated that our algorithm is robust in terms of seed choice. We show that diseases and their risk diseases as well as diseases with similar risk profiles tend to share both genes and drugs. CONCLUSIONS This unique dRiskKB, when combined with existing phenotypic, genetic, and genomic datasets, can have profound implications in our deeper understanding of disease etiology and in drug repositioning.
Collapse
Affiliation(s)
- Rong Xu
- Medical Informatics Division, Case Western Reserve University, Cleveland, OH, USA
| | - Li Li
- Departments of Family Medicine and Community Health, Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH, USA
| | | |
Collapse
|
445
|
Zhang SW, Shao DD, Zhang SY, Wang YB. Prioritization of candidate disease genes by enlarging the seed set and fusing information of the network topology and gene expression. MOLECULAR BIOSYSTEMS 2014; 10:1400-8. [PMID: 24695957 DOI: 10.1039/c3mb70588a] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/13/2023]
Abstract
The identification of disease genes is very important not only to provide greater understanding of gene function and cellular mechanisms which drive human disease, but also to enhance human disease diagnosis and treatment. Recently, high-throughput techniques have been applied to detect dozens or even hundreds of candidate genes. However, experimental approaches to validate the many candidates are usually time-consuming, tedious and expensive, and sometimes lack reproducibility. Therefore, numerous theoretical and computational methods (e.g. network-based approaches) have been developed to prioritize candidate disease genes. Many network-based approaches implicitly utilize the observation that genes causing the same or similar diseases tend to correlate with each other in gene-protein relationship networks. Of these network approaches, the random walk with restart algorithm (RWR) is considered to be a state-of-the-art approach. To further improve the performance of RWR, we propose a novel method named ESFSC to identify disease-related genes, by enlarging the seed set according to the centrality of disease genes in a network and fusing information of the protein-protein interaction (PPI) network topological similarity and the gene expression correlation. The ESFSC algorithm restarts at all of the nodes in the seed set consisting of the known disease genes and their k-nearest neighbor nodes, then walks in the global network separately guided by the similarity transition matrix constructed with PPI network topological similarity properties and the correlational transition matrix constructed with the gene expression profiles. As a result, all the genes in the network are ranked by weighted fusing the above results of the RWR guided by two types of transition matrices. Comprehensive simulation results of the 10 diseases with 97 known disease genes collected from the Online Mendelian Inheritance in Man (OMIM) database show that ESFSC outperforms existing methods for prioritizing candidate disease genes. The top prediction results of Alzheimer's disease are consistent with previous literature reports.
Collapse
Affiliation(s)
- Shao-Wu Zhang
- College of Automation, Northwestern Polytechnical University, 710072, Xi'an, China.
| | | | | | | |
Collapse
|
446
|
Hulovatyy Y, Solava RW, Milenković T. Revealing missing parts of the interactome via link prediction. PLoS One 2014; 9:e90073. [PMID: 24594900 PMCID: PMC3940777 DOI: 10.1371/journal.pone.0090073] [Citation(s) in RCA: 38] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2013] [Accepted: 01/29/2014] [Indexed: 12/20/2022] Open
Abstract
Protein interaction networks (PINs) are often used to "learn" new biological function from their topology. Since current PINs are noisy, their computational de-noising via link prediction (LP) could improve the learning accuracy. LP uses the existing PIN topology to predict missing and spurious links. Many of existing LP methods rely on shared immediate neighborhoods of the nodes to be linked. As such, they have limitations. Thus, in order to comprehensively study what are the topological properties of nodes in PINs that dictate whether the nodes should be linked, we introduce novel sensitive LP measures that are expected to overcome the limitations of the existing methods. We systematically evaluate the new and existing LP measures by introducing "synthetic" noise into PINs and measuring how accurate the measures are in reconstructing the original PINs. Also, we use the LP measures to de-noise the original PINs, and we measure biological correctness of the de-noised PINs with respect to functional enrichment of the predicted interactions. Our main findings are: 1) LP measures that favor nodes which are both "topologically similar" and have large shared extended neighborhoods are superior; 2) using more network topology often though not always improves LP accuracy; and 3) LP improves biological correctness of the PINs, plus we validate a significant portion of the predicted interactions in independent, external PIN data sources. Ultimately, we are less focused on identifying a superior method but more on showing that LP improves biological correctness of PINs, which is its ultimate goal in computational biology. But we note that our new methods outperform each of the existing ones with respect to at least one evaluation criterion. Alarmingly, we find that the different criteria often disagree in identifying the best method(s), which has important implications for LP communities in any domain, including social networks.
Collapse
Affiliation(s)
- Yuriy Hulovatyy
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Ryan W. Solava
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, Indiana, United States of America
| | - Tijana Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health, and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, Indiana, United States of America
- * E-mail:
| |
Collapse
|
447
|
Pastrello C, Pasini E, Kotlyar M, Otasek D, Wong S, Sangrar W, Rahmati S, Jurisica I. Integration, visualization and analysis of human interactome. Biochem Biophys Res Commun 2014; 445:757-73. [DOI: 10.1016/j.bbrc.2014.01.151] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 01/24/2014] [Indexed: 02/06/2023]
|
448
|
Abstract
MOTIVATION Because susceptibility to diseases increases with age, studying aging gains importance. Analyses of gene expression or sequence data, which have been indispensable for investigating aging, have been limited to studying genes and their protein products in isolation, ignoring their connectivities. However, proteins function by interacting with other proteins, and this is exactly what biological networks (BNs) model. Thus, analyzing the proteins' BN topologies could contribute to the understanding of aging. Current methods for analyzing systems-level BNs deal with their static representations, even though cells are dynamic. For this reason, and because different data types can give complementary biological insights, we integrate current static BNs with aging-related gene expression data to construct dynamic age-specific BNs. Then, we apply sensitive measures of topology to the dynamic BNs to study cellular changes with age. RESULTS While global BN topologies do not significantly change with age, local topologies of a number of genes do. We predict such genes to be aging-related. We demonstrate credibility of our predictions by (i) observing significant overlap between our predicted aging-related genes and 'ground truth' aging-related genes; (ii) observing significant overlap between functions and diseases that are enriched in our aging-related predictions and those that are enriched in 'ground truth' aging-related data; (iii) providing evidence that diseases which are enriched in our aging-related predictions are linked to human aging; and (iv) validating our high-scoring novel predictions in the literature. AVAILABILITY AND IMPLEMENTATION Software executables are available upon request.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, ECK Institute for Global Health and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN 46556, USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, ECK Institute for Global Health and Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN 46556, USA
| |
Collapse
|
449
|
Systems biology-based identification of Mycobacterium tuberculosis persistence genes in mouse lungs. mBio 2014; 5:mBio.01066-13. [PMID: 24549847 PMCID: PMC3944818 DOI: 10.1128/mbio.01066-13] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
Identifying Mycobacterium tuberculosis persistence genes is important for developing novel drugs to shorten the duration of tuberculosis (TB) treatment. We developed computational algorithms that predict M. tuberculosis genes required for long-term survival in mouse lungs. As the input, we used high-throughput M. tuberculosis mutant library screen data, mycobacterial global transcriptional profiles in mice and macrophages, and functional interaction networks. We selected 57 unique, genetically defined mutants (18 previously tested and 39 untested) to assess the predictive power of this approach in the murine model of TB infection. We observed a 6-fold enrichment in the predicted set of M. tuberculosis genes required for persistence in mouse lungs relative to randomly selected mutant pools. Our results also allowed us to reclassify several genes as required for M. tuberculosis persistence in vivo. Finally, the new results implicated additional high-priority candidate genes for testing. Experimental validation of computational predictions demonstrates the power of this systems biology approach for elucidating M. tuberculosis persistence genes. Mycobacterium tuberculosis, the causative agent of tuberculosis (TB), has a genetic repertoire that permits it to persist in the face of host immune responses. Identification of such persistence genes could reveal novel drug targets and elucidate mechanisms by which the organism eludes the immune system and resists drugs. Genetic screens have identified a total of 31 persistence genes, but to date only 15% of the ~4,000 M. tuberculosis genes have been tested experimentally. In this paper, as an alternative to brute force experimental screens, we describe computational methods that predict new persistence genes by combining known examples with growing databases of biological networks. Experimental testing demonstrated that these predictions are highly accurate, validating the computational approach and providing new information about M. tuberculosis persistence in host tissues. Using the new experimental results as additional input highlights additional genes for testing. Our approach can be extended to other data types and target organisms to characterize host-pathogen interactions relevant to this and other infectious diseases.
Collapse
|
450
|
Yang X, Gao L, Guo X, Shi X, Wu H, Song F, Wang B. A network based method for analysis of lncRNA-disease associations and prediction of lncRNAs implicated in diseases. PLoS One 2014; 9:e87797. [PMID: 24498199 PMCID: PMC3909255 DOI: 10.1371/journal.pone.0087797] [Citation(s) in RCA: 119] [Impact Index Per Article: 11.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2013] [Accepted: 12/31/2013] [Indexed: 02/01/2023] Open
Abstract
Increasing evidence has indicated that long non-coding RNAs (lncRNAs) are implicated in and associated with many complex human diseases. Despite of the accumulation of lncRNA-disease associations, only a few studies had studied the roles of these associations in pathogenesis. In this paper, we investigated lncRNA-disease associations from a network view to understand the contribution of these lncRNAs to complex diseases. Specifically, we studied both the properties of the diseases in which the lncRNAs were implicated, and that of the lncRNAs associated with complex diseases. Regarding the fact that protein coding genes and lncRNAs are involved in human diseases, we constructed a coding-non-coding gene-disease bipartite network based on known associations between diseases and disease-causing genes. We then applied a propagation algorithm to uncover the hidden lncRNA-disease associations in this network. The algorithm was evaluated by leave-one-out cross validation on 103 diseases in which at least two genes were known to be involved, and achieved an AUC of 0.7881. Our algorithm successfully predicted 768 potential lncRNA-disease associations between 66 lncRNAs and 193 diseases. Furthermore, our results for Alzheimer's disease, pancreatic cancer, and gastric cancer were verified by other independent studies.
Collapse
Affiliation(s)
- Xiaofei Yang
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
- * E-mail:
| | - Xingli Guo
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Xinghua Shi
- Department of Bioinformatics and Genomics, College of Computing and Informatics, University of North Carolina at Charlotte, Charlotte, North Carolina, Unites States of America
| | - Hao Wu
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Fei Song
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Bingbo Wang
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| |
Collapse
|