51
|
Wei PJ, Wu FX, Xia J, Su Y, Wang J, Zheng CH. Prioritizing Cancer Genes Based on an Improved Random Walk Method. Front Genet 2020; 11:377. [PMID: 32411180 PMCID: PMC7198854 DOI: 10.3389/fgene.2020.00377] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2019] [Accepted: 03/26/2020] [Indexed: 12/18/2022] Open
Abstract
Identifying driver genes that contribute to cancer progression from numerous passenger genes, although a central goal, is a major challenge. The protein-protein interaction network provides convenient and reasonable assistance for driver gene discovery. Random walk-based methods have been widely used to prioritize nodes in social or biological networks. However, most studies select the next arriving node uniformly from the random walker's neighbors. Few consider transiting preference according to the degree of random walker's neighbors. In this study, based on the random walk method, we propose a novel approach named Driver_IRW (Driver genes discovery with Improved Random Walk method), to prioritize cancer genes in cancer-related network. The key idea of Driver_IRW is to assign different transition probabilities for different edges of a constructed cancer-related network in accordance with the degree of the nodes' neighbors. Furthermore, the global centrality (here is betweenness centrality) and Katz feedback centrality are incorporated into the framework to evaluate the probability to walk to the seed nodes. Experimental results on four cancer types indicate that Driver_IRW performs more efficiently than some previously published methods for uncovering known cancer-related genes. In conclusion, our method can aid in prioritizing cancer-related genes and complement traditional frequency and network-based methods.
Collapse
Affiliation(s)
- Pi-Jing Wei
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Computer Sciences, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Junfeng Xia
- Institutes of Physical Science and Information Technology, Anhui University, Hefei, China
| | - Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| | - Jing Wang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
- College of Computer and Information Engineering, Fuyang Normal University, Fuyang, China
| | - Chun-Hou Zheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, College of Computer Science and Technology, Anhui University, Hefei, China
| |
Collapse
|
52
|
Klein HU, Schäfer M, Bennett DA, Schwender H, De Jager PL. Bayesian integrative analysis of epigenomic and transcriptomic data identifies Alzheimer's disease candidate genes and networks. PLoS Comput Biol 2020; 16:e1007771. [PMID: 32255787 PMCID: PMC7138305 DOI: 10.1371/journal.pcbi.1007771] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2019] [Accepted: 03/03/2020] [Indexed: 12/28/2022] Open
Abstract
Biomedical research studies have generated large multi-omic datasets to study complex diseases like Alzheimer’s disease (AD). An important aim of these studies is the identification of candidate genes that demonstrate congruent disease-related alterations across the different data types measured by the study. We developed a new method to detect such candidate genes in large multi-omic case-control studies that measure multiple data types in the same set of samples. The method is based on a gene-centric integrative coefficient quantifying to what degree consistent differences are observed in the different data types. For statistical inference, a Bayesian hierarchical model is used to study the distribution of the integrative coefficient. The model employs a conditional autoregressive prior to integrate a functional gene network and to share information between genes known to be functionally related. We applied the method to an AD dataset consisting of histone acetylation, DNA methylation, and RNA transcription data from human cortical tissue samples of 233 subjects, and we detected 816 genes with consistent differences between persons with AD and controls. The findings were validated in protein data and in RNA transcription data from two independent AD studies. Finally, we found three subnetworks of jointly dysregulated genes within the functional gene network which capture three distinct biological processes: myeloid cell differentiation, protein phosphorylation and synaptic signaling. Further investigation of the myeloid network indicated an upregulation of this network in early stages of AD prior to accumulation of hyperphosphorylated tau and suggested that increased CSF1 transcription in astrocytes may contribute to microglial activation in AD. Thus, we developed a method that integrates multiple data types and external knowledge of gene function to detect candidate genes, applied the method to an AD dataset, and identified several disease-related genes and processes demonstrating the usefulness of the integrative approach. Recent technological advances have led to a new generation of studies that interrogate multiple molecular levels in the same target tissue of a set of subjects, generating complex multi-omic datasets with which to study disease mechanism. These datasets of genetic, epigenomic, transcriptomic, and other data have the potential to reveal novel biological insights; however, integrative analyses remain challenging and require new computational methods. We developed an integrative Bayesian approach to detect genes with consistent differences between case and control samples across multiple data types. The method further integrates prior knowledge about gene function in the form of a gene functional similarity network to improve statistical inference by sharing information between related genes. We applied our method to an Alzheimer’s disease dataset of epigenomic and transcriptomic data and detected and then validated several novel and known candidate genes as well as three major disease-related biological processes. One of these processes reflected microglial activation and included the cytokine CSF1. Single-nucleus data revealed that CSF1 was primarily upregulated in astrocytes, implicating the involvement of this cell type in microglial activation. Hence, we demonstrated that integrative analysis approaches to multi-omic datasets can improve candidate gene detection and thereby generate new insights into complex diseases.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
- * E-mail:
| | - Martin Schäfer
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, Illinois, United States of America
| | - Holger Schwender
- Mathematical Institute, Heinrich Heine University, Düsseldorf, Germany
| | - Philip L. De Jager
- Center for Translational & Computational Neuroimmunology, Department of Neurology, Columbia University Irving Medical Center, New York, New York, United States of America
- Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Irving Medical Center, New York, New York, United States of America
| |
Collapse
|
53
|
Acosta JN, Brown SC, Falcone GJ. Genetic Variation and Response to Neurocritical Illness: a Powerful Approach to Identify Novel Pathophysiological Mechanisms and Therapeutic Targets. Neurotherapeutics 2020; 17:581-592. [PMID: 31975153 PMCID: PMC7283396 DOI: 10.1007/s13311-020-00837-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open
Abstract
Disease-specific therapeutic options for critically ill neurological patients are limited. The identification of new preventive, therapeutic, and rehabilitation strategies is of the utmost importance in the field of neurocritical care research. Population genetics offers powerful tools to identify and prioritize biological pathways to be targeted by novel interventions. New treatments with supportive genetic evidence have twice the chances of obtaining final FDA approval compared to those without this support. Large collaborations, public access to data, reproducible science, and innovative analytical methods have exponentially increased the pace of discoveries related to neurocritical care genetics.
Collapse
Affiliation(s)
- Julián N Acosta
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, Connecticut, 06520, USA
| | - Stacy C Brown
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, Connecticut, 06520, USA
| | - Guido J Falcone
- Division of Neurocritical Care and Emergency Neurology, Department of Neurology, Yale School of Medicine, New Haven, Connecticut, 06520, USA.
| |
Collapse
|
54
|
Chang JW, Ding Y, Tahir Ul Qamar M, Shen Y, Gao J, Chen LL. A deep learning model based on sparse auto-encoder for prioritizing cancer-related genes and drug target combinations. Carcinogenesis 2020; 40:624-632. [PMID: 30944926 DOI: 10.1093/carcin/bgz044] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2018] [Revised: 01/06/2019] [Accepted: 03/10/2019] [Indexed: 12/21/2022] Open
Abstract
Prioritization of cancer-related genes from gene expression profiles and proteomic data is vital to improve the targeted therapies research. Although computational approaches have been complementing high-throughput biological experiments on the understanding of human diseases, it still remains a big challenge to accurately discover cancer-related proteins/genes via automatic learning from large-scale protein/gene expression data and protein-protein interaction data. Most of the existing methods are based on network construction combined with gene expression profiles, which ignore the diversity between normal samples and disease cell lines. In this study, we introduced a deep learning model based on a sparse auto-encoder to learn the specific characteristics of protein interactions in cancer cell lines integrated with protein expression data. The model showed learning ability to identify cancer-related proteins/genes from the input of different protein expression profiles by extracting the characteristics of protein interaction information, which could also predict cancer-related protein combinations. Comparing with other reported methods including differential expression and network-based methods, our model got the highest area under the curve value (>0.8) in predicting cancer-related genes. Our study prioritized ~500 high-confidence cancer-related genes; among these genes, 211 already known cancer drug targets were found, which supported the accuracy of our method. The above results indicated that the proposed auto-encoder model could computationally prioritize candidate proteins/genes involved in cancer and improve the targeted therapies research.
Collapse
Affiliation(s)
- Ji-Wei Chang
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Yuduan Ding
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Muhammad Tahir Ul Qamar
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| | - Yin Shen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Junxiang Gao
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
| | - Ling-Ling Chen
- Hubei Key Laboratory of Agricultural Bioinformatics, College of Informatics, Huazhong Agricultural University, Wuhan, P. R. China
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, P. R. China
| |
Collapse
|
55
|
Chang HC, Chu CP, Lin SJ, Hsiao CK. Network hub-node prioritization of gene regulation with intra-network association. BMC Bioinformatics 2020; 21:101. [PMID: 32164570 PMCID: PMC7069025 DOI: 10.1186/s12859-020-3444-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 03/06/2020] [Indexed: 11/10/2022] Open
Abstract
Background To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes more often than non-hub nodes do. Such dependence among gene nodes can be conjectured based on the topology of the pathway network or the correlation between them. Results Here we develop a pathway activity score incorporating the marginal (local) effects of gene nodes as well as intra-network affinity measures. This score summarizes the expression levels in a gene-set/pathway for each sample, with weights on local and network information, respectively. The score is next used to examine the impact of each node through a leave-one-out evaluation. To illustrate the procedure, two cancer studies, one involving RNA-Seq from breast cancer patients with high-grade ductal carcinoma in situ and one microarray expression data from ovarian cancer patients, are used to assess the performance of the procedure, and to compare with existing methods, both ones that do and do not take into consideration correlation and network information. The hub nodes identified by the proposed procedure in the two cancer studies are known influential genes; some have been included in standard treatments and some are currently considered in clinical trials for target therapy. The results from simulation studies show that when marginal effects are mild or weak, the proposed procedure can still identify causal nodes, whereas methods relying only on marginal effect size cannot. Conclusions The NetworkHub procedure proposed in this research can effectively utilize the network information in combination with local effects derived from marker values, and provide a useful and complementary list of recommendations for prioritizing causal hubs.
Collapse
Affiliation(s)
- Hung-Ching Chang
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Chiao-Pei Chu
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Shu-Ju Lin
- Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Chuhsing Kate Hsiao
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan. .,Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, 10055, Taiwan.
| |
Collapse
|
56
|
Shang H, Liu ZP. Network-based prioritization of cancer genes by integrative ranks from multi-omics data. Comput Biol Med 2020; 119:103692. [PMID: 32339126 DOI: 10.1016/j.compbiomed.2020.103692] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 02/10/2020] [Accepted: 02/29/2020] [Indexed: 10/24/2022]
Abstract
Finding disease genes related to cancer is of great importance for diagnosis and treatment. With the development of high-throughput technologies, more and more multiple-level omics data have become available. Thus, it is urgent to develop computational methods to identify cancer genes by integrating these data. We propose an integrative rank-based method called iRank to prioritize cancer genes by integrating multi-omics data in a unified network-based framework. The method was used to identify the disease genes of hepatocellular carcinoma (HCC) in humans using the multi-omics data for HCC from TCGA after building up integrated networks in the corresponding molecular levels. The kernel of iRank is based on an improved PageRank algorithm with constraints. To demonstrate the validity and the effectiveness of the method, we performed experiments for comparison between single-level omics data and multiple omics data as well as with other algorithms: random walk (RW), random walk with restart on heterogeneous network (RWH), PRINCE and PhenoRank. We also performed a case study on another cancer, prostate adenocarcinoma (PRAD). The results indicate the effectiveness and efficiency of iRank which demonstrates the significance of integrating multi-omics data and multiplex networks in cancer gene prioritization.
Collapse
Affiliation(s)
- Haixia Shang
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China
| | - Zhi-Ping Liu
- Department of Biomedical Engineering, School of Control Science and Engineering, Shandong University, Jinan, Shandong 250061, China; Center of Intelligent Medicine, Shandong University, Jinan, Shandong 250061, China.
| |
Collapse
|
57
|
Network modeling of patients' biomolecular profiles for clinical phenotype/outcome prediction. Sci Rep 2020; 10:3612. [PMID: 32107391 PMCID: PMC7046773 DOI: 10.1038/s41598-020-60235-8] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2019] [Accepted: 11/05/2019] [Indexed: 12/15/2022] Open
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. We introduce a novel network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Experimental tests involving several publicly available datasets of patients afflicted with pancreatic, breast, colon and colorectal cancer show that our proposed method is competitive with state-of-the-art supervised and semi-supervised predictive systems. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.
Collapse
|
58
|
Luo Y, Mao C, Yang Y, Wang F, Ahmad FS, Arnett D, Irvin MR, Shah SJ. Integrating hypertension phenotype and genotype with hybrid non-negative matrix factorization. Bioinformatics 2020; 35:1395-1403. [PMID: 30239588 DOI: 10.1093/bioinformatics/bty804] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/08/2018] [Revised: 08/20/2018] [Accepted: 09/13/2018] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Hypertension is a heterogeneous syndrome in need of improved subtyping using phenotypic and genetic measurements with the goal of identifying subtypes of patients who share similar pathophysiologic mechanisms and may respond more uniformly to targeted treatments. Existing machine learning approaches often face challenges in integrating phenotype and genotype information and presenting to clinicians an interpretable model. We aim to provide informed patient stratification based on phenotype and genotype features. RESULTS In this article, we present a hybrid non-negative matrix factorization (HNMF) method to integrate phenotype and genotype information for patient stratification. HNMF simultaneously approximates the phenotypic and genetic feature matrices using different appropriate loss functions, and generates patient subtypes, phenotypic groups and genetic groups. Unlike previous methods, HNMF approximates phenotypic matrix under Frobenius loss, and genetic matrix under Kullback-Leibler (KL) loss. We propose an alternating projected gradient method to solve the approximation problem. Simulation shows HNMF converges fast and accurately to the true factor matrices. On a real-world clinical dataset, we used the patient factor matrix as features and examined the association of these features with indices of cardiac mechanics. We compared HNMF with six different models using phenotype or genotype features alone, with or without NMF, or using joint NMF with only one type of loss We also compared HNMF with 3 recently published methods for integrative clustering analysis, including iClusterBayes, Bayesian joint analysis and JIVE. HNMF significantly outperforms all comparison models. HNMF also reveals intuitive phenotype-genotype interactions that characterize cardiac abnormalities. AVAILABILITY AND IMPLEMENTATION Our code is publicly available on github at https://github.com/yuanluo/hnmf. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yuan Luo
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Chengsheng Mao
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Yiben Yang
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Fei Wang
- Department of Healthcare Policy & Research, Weill Cornell Medicine, Cornell University New York, NY, USA
| | - Faraz S Ahmad
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| | - Donna Arnett
- Department of Epidemiology, College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sanjiv J Shah
- Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL, USA
| |
Collapse
|
59
|
Cabrera-Andrade A, López-Cortés A, Jaramillo-Koupermann G, Paz-y-Miño C, Pérez-Castillo Y, Munteanu CR, González-Díaz H, Pazos A, Tejera E. Gene Prioritization through Consensus Strategy, Enrichment Methodologies Analysis, and Networking for Osteosarcoma Pathogenesis. Int J Mol Sci 2020; 21:E1053. [PMID: 32033398 PMCID: PMC7038221 DOI: 10.3390/ijms21031053] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2019] [Revised: 01/30/2020] [Accepted: 01/30/2020] [Indexed: 12/12/2022] Open
Abstract
Osteosarcoma is the most common subtype of primary bone cancer, affecting mostly adolescents. In recent years, several studies have focused on elucidating the molecular mechanisms of this sarcoma; however, its molecular etiology has still not been determined with precision. Therefore, we applied a consensus strategy with the use of several bioinformatics tools to prioritize genes involved in its pathogenesis. Subsequently, we assessed the physical interactions of the previously selected genes and applied a communality analysis to this protein-protein interaction network. The consensus strategy prioritized a total list of 553 genes. Our enrichment analysis validates several studies that describe the signaling pathways PI3K/AKT and MAPK/ERK as pathogenic. The gene ontology described TP53 as a principal signal transducer that chiefly mediates processes associated with cell cycle and DNA damage response It is interesting to note that the communality analysis clusters several members involved in metastasis events, such as MMP2 and MMP9, and genes associated with DNA repair complexes, like ATM, ATR, CHEK1, and RAD51. In this study, we have identified well-known pathogenic genes for osteosarcoma and prioritized genes that need to be further explored.
Collapse
Affiliation(s)
- Alejandro Cabrera-Andrade
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Carrera de Enfermería, Facultad de Ciencias de la Salud, Universidad de Las Américas, Quito 170125, Ecuador
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
| | - Andrés López-Cortés
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador;
| | - Gabriela Jaramillo-Koupermann
- Laboratorio de Biología Molecular, Subproceso de Anatomía Patológica, Hospital de Especialidades Eugenio Espejo, Quito 170403, Ecuador;
| | - César Paz-y-Miño
- Centro de Investigación Genética y Genómica, Facultad de Ciencias de la Salud Eugenio Espejo, Universidad UTE, Quito 170129, Ecuador;
| | - Yunierkis Pérez-Castillo
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Escuela de Ciencias Físicas y Matemáticas, Universidad de Las Américas, Quito 170125, Ecuador
| | - Cristian R. Munteanu
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
| | - Humbert González-Díaz
- Department of Organic Chemistry II, University of the Basque Country UPV/EHU, 48940 Leioa, Spain
- IKERBASQUE, Basque Foundation for Science, 48011 Bilbao, Spain;
| | - Alejandro Pazos
- RNASA-IMEDIR, Computer Sciences Faculty, University of A Coruna, 15071 A Coruña, Spain; (A.L.-C.); (C.R.M.); (A.P.)
- Biomedical Research Institute of A Coruña (INIBIC), University Hospital Complex of A Coruña (CHUAC), 15006 A Coruña, Spain
- Centro de Investigación en Tecnologías de la Información y las Comunicaciones (CITIC), Campus de Elviña s/n, 15071 A Coruña, Spain
| | - Eduardo Tejera
- Grupo de Bio-Quimioinformática, Universidad de Las Américas, Quito 170125, Ecuador;
- Facultad de Ingeniería y Ciencias Agropecuarias, Universidad de Las Américas, Quito 170125, Ecuador
| |
Collapse
|
60
|
Koutrouli M, Karatzas E, Paez-Espino D, Pavlopoulos GA. A Guide to Conquer the Biological Network Era Using Graph Theory. Front Bioeng Biotechnol 2020; 8:34. [PMID: 32083072 PMCID: PMC7004966 DOI: 10.3389/fbioe.2020.00034] [Citation(s) in RCA: 99] [Impact Index Per Article: 24.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Accepted: 01/15/2020] [Indexed: 12/24/2022] Open
Abstract
Networks are one of the most common ways to represent biological systems as complex sets of binary interactions or relations between different bioentities. In this article, we discuss the basic graph theory concepts and the various graph types, as well as the available data structures for storing and reading graphs. In addition, we describe several network properties and we highlight some of the widely used network topological features. We briefly mention the network patterns, motifs and models, and we further comment on the types of biological and biomedical networks along with their corresponding computer- and human-readable file formats. Finally, we discuss a variety of algorithms and metrics for network analyses regarding graph drawing, clustering, visualization, link prediction, perturbation, and network alignment as well as the current state-of-the-art tools. We expect this review to reach a very broad spectrum of readers varying from experts to beginners while encouraging them to enhance the field further.
Collapse
Affiliation(s)
- Mikaela Koutrouli
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
| | - Evangelos Karatzas
- Institute for Fundamental Biomedical Research, BSRC “Alexander Fleming”, Vari, Greece
- Department of Informatics and Telecommunications, University of Athens, Athens, Greece
| | - David Paez-Espino
- Lawrence Berkeley National Laboratory, Department of Energy, Joint Genome Institute, Walnut Creek, CA, United States
| | | |
Collapse
|
61
|
Tran VD, Sperduti A, Backofen R, Costa F. Heterogeneous networks integration for disease–gene prioritization with node kernels. Bioinformatics 2020; 36:2649-2656. [DOI: 10.1093/bioinformatics/btaa008] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2019] [Revised: 12/19/2019] [Accepted: 01/23/2020] [Indexed: 12/21/2022] Open
Abstract
Abstract
Motivation
The identification of disease–gene associations is a task of fundamental importance in human health research. A typical approach consists in first encoding large gene/protein relational datasets as networks due to the natural and intuitive property of graphs for representing objects’ relationships and then utilizing graph-based techniques to prioritize genes for successive low-throughput validation assays. Since different types of interactions between genes yield distinct gene networks, there is the need to integrate different heterogeneous sources to improve the reliability of prioritization systems.
Results
We propose an approach based on three phases: first, we merge all sources in a single network, then we partition the integrated network according to edge density introducing a notion of edge type to distinguish the parts and finally, we employ a novel node kernel suitable for graphs with typed edges. We show how the node kernel can generate a large number of discriminative features that can be efficiently processed by linear regularized machine learning classifiers. We report state-of-the-art results on 12 disease–gene associations and on a time-stamped benchmark containing 42 newly discovered associations.
Availability and implementation
Source code: https://github.com/dinhinfotech/DiGI.git.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Van Dinh Tran
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
| | | | - Rolf Backofen
- Bioinformatics Group, Department of Computer Science, University of Freiburg, Freiburg im Breisgau, Germany
- Signalling Research Centres BIOSS and CIBSS, University of Freiburg, Germany
| | - Fabrizio Costa
- Department of Computer Science, University of Exeter, Exeter, UK
| |
Collapse
|
62
|
Hur B, Kang D, Lee S, Moon JH, Lee G, Kim S. Venn-diaNet : venn diagram based network propagation analysis framework for comparing multiple biological experiments. BMC Bioinformatics 2019; 20:667. [PMID: 31881980 PMCID: PMC6941187 DOI: 10.1186/s12859-019-3302-7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2019] [Accepted: 12/02/2019] [Indexed: 12/13/2022] Open
Abstract
BACKGROUND The main research topic in this paper is how to compare multiple biological experiments using transcriptome data, where each experiment is measured and designed to compare control and treated samples. Comparison of multiple biological experiments is usually performed in terms of the number of DEGs in an arbitrary combination of biological experiments. This process is usually facilitated with Venn diagram but there are several issues when Venn diagram is used to compare and analyze multiple experiments in terms of DEGs. First, current Venn diagram tools do not provide systematic analysis to prioritize genes. Because that current tools generally do not fully focus to prioritize genes, genes that are located in the segments in the Venn diagram (especially, intersection) is usually difficult to rank. Second, elucidating the phenotypic difference only with the lists of DEGs and expression values is challenging when the experimental designs have the combination of treatments. Experiment designs that aim to find the synergistic effect of the combination of treatments are very difficult to find without an informative system. RESULTS We introduce Venn-diaNet, a Venn diagram based analysis framework that uses network propagation upon protein-protein interaction network to prioritizes genes from experiments that have multiple DEG lists. We suggest that the two issues can be effectively handled by ranking or prioritizing genes with segments of a Venn diagram. The user can easily compare multiple DEG lists with gene rankings, which is easy to understand and also can be coupled with additional analysis for their purposes. Our system provides a web-based interface to select seed genes in any of areas in a Venn diagram and then perform network propagation analysis to measure the influence of the selected seed genes in terms of ranked list of DEGs. CONCLUSIONS We suggest that our system can logically guide to select seed genes without additional prior knowledge that makes us free from the seed selection of network propagation issues. We showed that Venn-diaNet can reproduce the research findings reported in the original papers that have experiments that compare two, three and eight experiments. Venn-diaNet is freely available at: http://biohealth.snu.ac.kr/software/venndianet.
Collapse
Affiliation(s)
- Benjamin Hur
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Dongwon Kang
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Sangseon Lee
- Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea
| | - Ji Hwan Moon
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Gung Lee
- National Creative Research Initiatives Center for Adipose Tissue Remodeling, Institute of Molecular Biology and Genetics, Department of Biological Sciences, Seoul National University, 1 Gwanak-ro, Seoul, Korea
| | - Sun Kim
- Interdisciplinary Program in Bioinformatics, Seoul National University, 1 Gwanak-ro, Seoul, Korea. .,Department of Computer Science and Engineering, 1 Gwanak-ro, Seoul, Korea. .,Bioinformatics Institute, Seoul National University, 1 Gwanak-ro, Seoul, Korea.
| |
Collapse
|
63
|
TopControl: A Tool to Prioritize Candidate Disease-associated Genes based on Topological Network Features. Sci Rep 2019; 9:19472. [PMID: 31857653 PMCID: PMC6923402 DOI: 10.1038/s41598-019-55954-6] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2019] [Accepted: 12/04/2019] [Indexed: 11/09/2022] Open
Abstract
Putative disease-associated genes are often identified among those genes that are differentially expressed in disease and in normal conditions. This strategy typically yields thousands of genes. Gene prioritizing schemes boost the power of identifying the most promising disease-associated genes among such a set of candidates. We introduce here a novel system for prioritizing genes where a TF-miRNA co-regulatory network is constructed for the set of genes, while the ranks of the candidates are determined by topological and biological factors. For datasets on breast invasive carcinoma and liver hepatocellular carcinoma this novel prioritization technique identified a significant portion of known disease-associated genes and suggested new candidates which can be investigated later as putative disease-associated genes.
Collapse
|
64
|
Christopher Corton J. Integrating gene expression biomarker predictions into networks of adverse outcome pathways. CURRENT OPINION IN TOXICOLOGY 2019. [DOI: 10.1016/j.cotox.2019.05.006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]
|
65
|
Peng Y, Jiang Y, Radivojac P. Enumerating consistent sub-graphs of directed acyclic graphs: an insight into biomedical ontologies. Bioinformatics 2019; 34:i313-i322. [PMID: 29949985 PMCID: PMC6022688 DOI: 10.1093/bioinformatics/bty268] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation Modern problems of concept annotation associate an object of interest (gene, individual, text document) with a set of interrelated textual descriptors (functions, diseases, topics), often organized in concept hierarchies or ontologies. Most ontology can be seen as directed acyclic graphs (DAGs), where nodes represent concepts and edges represent relational ties between these concepts. Given an ontology graph, each object can only be annotated by a consistent sub-graph; that is, a sub-graph such that if an object is annotated by a particular concept, it must also be annotated by all other concepts that generalize it. Ontologies therefore provide a compact representation of a large space of possible consistent sub-graphs; however, until now we have not been aware of a practical algorithm that can enumerate such annotation spaces for a given ontology. Results We propose an algorithm for enumerating consistent sub-graphs of DAGs. The algorithm recursively partitions the graph into strictly smaller graphs until the resulting graph becomes a rooted tree (forest), for which a linear-time solution is computed. It then combines the tallies from graphs created in the recursion to obtain the final count. We prove the correctness of this algorithm, propose several practical accelerations, evaluate it on random graphs and then apply it to characterize four major biomedical ontologies. We believe this work provides valuable insights into the complexity of concept annotation spaces and its potential influence on the predictability of ontological annotation. Availability and implementation https://github.com/shawn-peng/counting-consistent-sub-DAG Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yisu Peng
- Department of Computer Science, Indiana University, Bloomington, USA
| | - Yuxiang Jiang
- Department of Computer Science, Indiana University, Bloomington, USA
| | - Predrag Radivojac
- Department of Computer Science, Indiana University, Bloomington, USA
| |
Collapse
|
66
|
Corton JC, Kleinstreuer NC, Judson RS. Identification of potential endocrine disrupting chemicals using gene expression biomarkers. Toxicol Appl Pharmacol 2019; 380:114683. [DOI: 10.1016/j.taap.2019.114683] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2019] [Revised: 07/05/2019] [Accepted: 07/15/2019] [Indexed: 02/07/2023]
|
67
|
Battaglia C, Venturin M, Sojic A, Jesuthasan N, Orro A, Spinelli R, Musicco M, De Bellis G, Adorni F. Candidate Genes and MiRNAs Linked to the Inverse Relationship Between Cancer and Alzheimer's Disease: Insights From Data Mining and Enrichment Analysis. Front Genet 2019; 10:846. [PMID: 31608105 PMCID: PMC6771301 DOI: 10.3389/fgene.2019.00846] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2019] [Accepted: 08/14/2019] [Indexed: 12/22/2022] Open
Abstract
The incidence of cancer and Alzheimer’s disease (AD) increases exponentially with age. A growing body of epidemiological evidence and molecular investigations inspired the hypothesis of an inverse relationship between these two pathologies. It has been proposed that the two diseases might utilize the same proteins and pathways that are, however, modulated differently and sometimes in opposite directions. Investigation of the common processes underlying these diseases may enhance the understanding of their pathogenesis and may also guide novel therapeutic strategies. Starting from a text-mining approach, our in silico study integrated the dispersed biological evidence by combining data mining, gene set enrichment, and protein-protein interaction (PPI) analyses while searching for common biological hallmarks linked to AD and cancer. We retrieved 138 genes (ALZCAN gene set), computed a significant number of enriched gene ontology clusters, and identified four PPI modules. The investigation confirmed the relevance of autophagy, ubiquitin proteasome system, and cell death as common biological hallmarks shared by cancer and AD. Then, from a closer investigation of the PPI modules and of the miRNAs enrichment data, several genes (SQSTM1, UCHL1, STUB1, BECN1, CDKN2A, TP53, EGFR, GSK3B, and HSPA9) and miRNAs (miR-146a-5p, MiR-34a-5p, miR-21-5p, miR-9-5p, and miR-16-5p) emerged as promising candidates. The integrative approach uncovered novel miRNA-gene networks (e.g., miR-146 and miR-34 regulating p62 and Beclin1 in autophagy) that might give new insights into the complex regulatory mechanisms of gene expression in AD and cancer.
Collapse
Affiliation(s)
- Cristina Battaglia
- Department of Medical Biotechnology and Translational Medicine (BIOMETRA), University of Milan, Segrate, Italy.,Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| | - Marco Venturin
- Department of Medical Biotechnology and Translational Medicine (BIOMETRA), University of Milan, Segrate, Italy
| | - Aleksandra Sojic
- Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| | - Nithiya Jesuthasan
- Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| | - Alessandro Orro
- Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| | - Roberta Spinelli
- Istituto Istruzione Superiore Statale IRIS Versari, Cesano Maderno, Italy
| | - Massimo Musicco
- Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| | - Gianluca De Bellis
- Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| | - Fulvio Adorni
- Department of Biomedical Sciences, Institute of Biomedical Technologies-National Research Council (ITB-CNR), Segrate, Italy
| |
Collapse
|
68
|
Chagoyen M, Ranea JAG, Pazos F. Applications of molecular networks in biomedicine. Biol Methods Protoc 2019; 4:bpz012. [PMID: 32395629 PMCID: PMC7200821 DOI: 10.1093/biomethods/bpz012] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/20/2019] [Accepted: 08/28/2019] [Indexed: 12/12/2022] Open
Abstract
Due to the large interdependence between the molecular components of living systems, many phenomena, including those related to pathologies, cannot be explained in terms of a single gene or a small number of genes. Molecular networks, representing different types of relationships between molecular entities, embody these large sets of interdependences in a framework that allow their mining from a systemic point of view to obtain information. These networks, often generated from high-throughput omics datasets, are used to study the complex phenomena of human pathologies from a systemic point of view. Complementing the reductionist approach of molecular biology, based on the detailed study of a small number of genes, systemic approaches to human diseases consider that these are better reflected in large and intricate networks of relationships between genes. These networks, and not the single genes, provide both better markers for diagnosing diseases and targets for treating them. Network approaches are being used to gain insight into the molecular basis of complex diseases and interpret the large datasets associated with them, such as genomic variants. Network formalism is also suitable for integrating large, heterogeneous and multilevel datasets associated with diseases from the molecular level to organismal and epidemiological scales. Many of these approaches are available to nonexpert users through standard software packages.
Collapse
Affiliation(s)
- Monica Chagoyen
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| | - Juan A G Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras, Instituto de Salud Carlos III, Madrid, Spain
| | - Florencio Pazos
- Computational Systems Biology Group, Systems Biology Program, National Centre for Biotechnology (CNB-CSIC), Madrid, Spain
| |
Collapse
|
69
|
Ebiki M, Okazaki T, Kai M, Adachi K, Nanba E. Comparison of Causative Variant Prioritization Tools Using Next-generation Sequencing Data in Japanese Patients with Mendelian Disorders. Yonago Acta Med 2019; 62:244-252. [PMID: 31582890 DOI: 10.33160/yam.2019.09.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2019] [Accepted: 07/17/2019] [Indexed: 12/24/2022]
Abstract
Background During the investigation of causative variants of Mendelian disorders using next-generation sequencing, the enormous number of possible candidates makes the detection process complex, and the use of multidimensional methods is required. Although the utility of several variant prioritization tools has been reported, their effectiveness in Japanese patients remains largely unknown. Methods We selected 5 free variant prioritization tools (PhenIX, hiPHIVE, Phen-Gen, eXtasy-order statistics, and eXtasy-combined max) and assessed their effectiveness in Japanese patient populations. To compare these tools, we conducted 2 studies: one based on simulated data of 100 diseases and another based on the exome data of 20 in-house patients with Mendelian disorders. To this end we selected 100 pathogenic variants from the "Database of Pathogenic Variants (DPV)" and created 100 variant call format (VCF) files that each had pathogenic variants based on reference human genome data from the 1000 Genomes Project. The later "in-house" study used exome data from 20 Japanese patients with Mendelian disorders. In both studies, we utilized 1-5 terms of "Human Phenotype Ontology" as clinical information. Results In our analysis based on simulated disease data, the detection rate of the top 10 causative variants was 91% for hiPHIVE, and 88% for PhenIX, based on 100 sets of simulated disease VCF data. Also, both software packages detected 82% of the top 1 causative variants. When we used data from our in-house patients instead, we found that these two programs (PhenIX and hiPHIVE) produced higher detection rates than the other three systems in our study. The detection rate of the top 1 causative variant was 71.4% for PhenIX, 65.0% for hiPHIVE. Conclusion The rates of detecting causative variants in two Exomizer software packages, hiPHIVE and PhenIX, were higher than for the other three software systems we analyzed, with respect to Japanese patients.
Collapse
Affiliation(s)
- Mitsutaka Ebiki
- The Development of Innovative Future Medical Treatment, Graduate School of Medical Sciences, Tottori University, Yonago 683-8504, Japan.,KUSUNOKI SCALE INC., Yonago 683-0832, Japan
| | - Tetsuya Okazaki
- Division of Child Neurology, Department of Brain and Neurosciences, School of Medicine, Tottori University Faculty of Medicine, Yonago 683-8504, Japan.,Division of Clinical Genetics, Tottori University Hospital, Yonago 683-8504, Japan, ‖Technical Department, Tottori University, Yonago 683-8503, Japan
| | - Masachika Kai
- Research Initiative Center, Organization for Research Initiative and Promotion, Tottori University, Yonago 683-8503, Japan
| | - Kaori Adachi
- Research Strategy Division, Organization for Research Initiative and Promotion, Tottori University, Yonago 683-8503, Japan
| | - Eiji Nanba
- Division of Clinical Genetics, Tottori University Hospital, Yonago 683-8504, Japan, ‖Technical Department, Tottori University, Yonago 683-8503, Japan.,Research Strategy Division, Organization for Research Initiative and Promotion, Tottori University, Yonago 683-8503, Japan
| |
Collapse
|
70
|
Zolotareva O, Kleine M. A Survey of Gene Prioritization Tools for Mendelian and Complex Human Diseases. J Integr Bioinform 2019; 16:/j/jib.ahead-of-print/jib-2018-0069/jib-2018-0069.xml. [PMID: 31494632 PMCID: PMC7074139 DOI: 10.1515/jib-2018-0069] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/06/2018] [Accepted: 07/12/2019] [Indexed: 12/16/2022] Open
Abstract
Modern high-throughput experiments provide us with numerous potential associations between genes and diseases. Experimental validation of all the discovered associations, let alone all the possible interactions between them, is time-consuming and expensive. To facilitate the discovery of causative genes, various approaches for prioritization of genes according to their relevance for a given disease have been developed. In this article, we explain the gene prioritization problem and provide an overview of computational tools for gene prioritization. Among about a hundred of published gene prioritization tools, we select and briefly describe 14 most up-to-date and user-friendly. Also, we discuss the advantages and disadvantages of existing tools, challenges of their validation, and the directions for future research.
Collapse
Affiliation(s)
- Olga Zolotareva
- Bielefeld University, Faculty of Technology and Center for Biotechnology, International Research Training Group "Computational Methods for the Analysis of the Diversity and Dynamics of Genomes" and Genome Informatics, Universitätsstraße 25, Bielefeld, Germany
| | - Maren Kleine
- Bielefeld University, Faculty of Technology, Bioinformatics/Medical Informatics Department, Universitätsstraße 25, Bielefeld, Germany
| |
Collapse
|
71
|
Abstract
Rapid advances in genomic technologies have led to a wealth of diverse data, from which novel discoveries can be gleaned through the application of robust statistical and computational methods. Here, we describe GeneFishing, a semisupervised computational approach to reconstruct context-specific portraits of biological processes by leveraging gene-gene coexpression information. GeneFishing incorporates multiple high-dimensional statistical ideas, including dimensionality reduction, clustering, subsampling, and results aggregation, to produce robust results. To illustrate the power of our method, we applied it using 21 genes involved in cholesterol metabolism as "bait" to "fish out" (or identify) genes not previously identified as being connected to cholesterol metabolism. Using simulation and real datasets, we found that the results obtained through GeneFishing were more interesting for our study than those provided by related gene prioritization methods. In particular, application of GeneFishing to the GTEx liver RNA sequencing (RNAseq) data not only reidentified many known cholesterol-related genes, but also pointed to glyoxalase I (GLO1) as a gene implicated in cholesterol metabolism. In a follow-up experiment, we found that GLO1 knockdown in human hepatoma cell lines increased levels of cellular cholesterol ester, validating a role for GLO1 in cholesterol metabolism. In addition, we performed pantissue analysis by applying GeneFishing on various tissues and identified many potential tissue-specific cholesterol metabolism-related genes. GeneFishing appears to be a powerful tool for identifying related components of complex biological systems and may be used across a wide range of applications.
Collapse
|
72
|
Lei H, Liu W, Si J, Wang J, Zhang T. Analyzing the regulation of miRNAs on protein-protein interaction network in Hodgkin lymphoma. BMC Bioinformatics 2019; 20:449. [PMID: 31477006 PMCID: PMC6720096 DOI: 10.1186/s12859-019-3041-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2019] [Accepted: 08/21/2019] [Indexed: 12/28/2022] Open
Abstract
Background Hodgkin Lymphoma (HL) is a type of aggressive malignancy in lymphoma that has high incidence in young adults and elderly patients. Identification of reliable diagnostic markers and efficient therapeutic targets are especially important for the diagnosis and treatment of HL. Although many HL-related molecules have been identified, our understanding on the molecular mechanisms underlying the disease is still far from complete due to its complex and heterogeneous characteristics. In such situation, exploring the molecular mechanisms underlying HL via systems biology approaches provides a promising option. In this study, we try to elucidate the molecular mechanisms related to the disease and identify potential pharmaceutical targets from a network-based perspective. Results We constructed a series of network models. Based on the analysis of these networks, we attempted to identify the biomarkers and elucidate the molecular mechanisms underlying HL. Initially, we built three different but related protein networks, i.e., background network, HL-basic network and HL-specific network. By analyzing these three networks, we investigated the connection characteristic of the HL-related proteins. Subsequently, we explored the miRNA regulation on HL-specific network and analyzed three kinds of simple regulation patterns, i.e., co-regulation of protein pairs, as well as the direct and indirect regulation of triple proteins. Finally, we constructed a simplified protein network combined with the regulation of miRNAs on proteins to better understand the relation between HL-related proteins and miRNAs. Conclusions We find that the HL-related proteins are more likely to connect with each other compared to other proteins. Moreover, the HL-specific network can be further divided into five sub-networks and 49 proteins as the backbone of HL-specific network make up and connect these 5 sub-networks. Thus, they may be closely associated with HL. In addition, we find that the co-regulation of protein pairs is the main regulatory pattern of miRNAs on the protein network in the HL-specific network. According to the regulation of miRNA on protein network, we have identified 5 core miRNAs as the potential biomarkers for diagnostic of HL. Finally, several protein pathways have been identified to closely associated with HL, which provides deep insights into underlying mechanism of HL. Electronic supplementary material The online version of this article (10.1186/s12859-019-3041-9) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Huimin Lei
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China.,School of Continuation Education, Tianjin Medical University, Tianjin, China
| | - Wenxu Liu
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Jiarui Si
- School of Basic Medicine, Tianjin Medical University, Tianjin, China
| | - Ju Wang
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China
| | - Tao Zhang
- School of Biomedical Engineering, Tianjin Medical University, Tianjin, China.
| |
Collapse
|
73
|
Wang M, Chun J, Genovese G, Knob AU, Benjamin A, Wilkins MS, Friedman DJ, Appel GB, Lifton RP, Mane S, Pollak MR. Contributions of Rare Gene Variants to Familial and Sporadic FSGS. J Am Soc Nephrol 2019; 30:1625-1640. [PMID: 31308072 PMCID: PMC6727251 DOI: 10.1681/asn.2019020152] [Citation(s) in RCA: 39] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2019] [Accepted: 04/25/2019] [Indexed: 12/12/2022] Open
Abstract
BACKGROUND Over the past two decades, the importance of genetic factors in the development of FSGS has become increasingly clear. However, despite many known monogenic causes of FSGS, single gene defects explain only 30% of cases. METHODS To investigate mutations underlying FSGS, we sequenced 662 whole exomes from individuals with sporadic or familial FSGS. After quality control, we analyzed the exome data from 363 unrelated family units with sporadic or familial FSGS and compared this to data from 363 ancestry-matched controls. We used rare variant burden tests to evaluate known disease-associated genes and potential new genes. RESULTS We validated several FSGS-associated genes that show a marked enrichment of deleterious rare variants among the cases. However, for some genes previously reported as FSGS related, we identified rare variants at similar or higher frequencies in controls. After excluding such genes, 122 of 363 cases (33.6%) had rare variants in known disease-associated genes, but 30 of 363 controls (8.3%) also harbored rare variants that would be classified as "causal" if detected in cases; applying American College of Medical Genetics filtering guidelines (to reduce the rate of false-positive claims that a variant is disease related) yielded rates of 24.2% in cases and 5.5% in controls. Highly ranked new genes include SCAF1, SETD2, and LY9. Network analysis showed that top-ranked new genes were located closer than a random set of genes to known FSGS genes. CONCLUSIONS Although our analysis validated many known FSGS-causing genes, we detected a nontrivial number of purported "disease-causing" variants in controls, implying that filtering is inadequate to allow clinical diagnosis and decision making. Genetic diagnosis in patients with FSGS is complicated by the nontrivial rate of variants in known FSGS genes among people without kidney disease.
Collapse
Affiliation(s)
- Minxian Wang
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
- Department of Medicine, Harvard Medical School, Boston, Massachusetts
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Justin Chun
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
- Department of Medicine, Harvard Medical School, Boston, Massachusetts
- Division of Nephrology, Department of Medicine, University of Calgary, Cumming School of Medicine, Calgary, Alberta, Canada
| | - Giulio Genovese
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts
| | - Andrea U Knob
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Ava Benjamin
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - Maris S Wilkins
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
| | - David J Friedman
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts
- Department of Medicine, Harvard Medical School, Boston, Massachusetts
| | - Gerald B Appel
- Division of Nephrology, Department of Medicine, Columbia University College of Physicians and Surgeons, New York, New York
| | - Richard P Lifton
- Laboratory of Human Genetics and Genomics, The Rockefeller University, New York, New York; and
| | - Shrikant Mane
- Department of Genetics, Yale University School of Medicine, New Haven, Connecticut
| | - Martin R Pollak
- Division of Nephrology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, Massachusetts;
- Department of Medicine, Harvard Medical School, Boston, Massachusetts
- Program in Medical and Population Genetics, Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, Massachusetts
| |
Collapse
|
74
|
Zakeri P, Simm J, Arany A, ElShal S, Moreau Y. Gene prioritization using Bayesian matrix factorization with genomic and phenotypic side information. Bioinformatics 2019; 34:i447-i456. [PMID: 29949967 PMCID: PMC6022676 DOI: 10.1093/bioinformatics/bty289] [Citation(s) in RCA: 21] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Motivation Most gene prioritization methods model each disease or phenotype individually, but this fails to capture patterns common to several diseases or phenotypes. To overcome this limitation, we formulate the gene prioritization task as the factorization of a sparsely filled gene-phenotype matrix, where the objective is to predict the unknown matrix entries. To deliver more accurate gene-phenotype matrix completion, we extend classical Bayesian matrix factorization to work with multiple side information sources. The availability of side information allows us to make non-trivial predictions for genes for which no previous disease association is known. Results Our gene prioritization method can innovatively not only integrate data sources describing genes, but also data sources describing Human Phenotype Ontology terms. Experimental results on our benchmarks show that our proposed model can effectively improve accuracy over the well-established gene prioritization method, Endeavour. In particular, our proposed method offers promising results on diseases of the nervous system; diseases of the eye and adnexa; endocrine, nutritional and metabolic diseases; and congenital malformations, deformations and chromosomal abnormalities, when compared to Endeavour. Availability and implementation The Bayesian data fusion method is implemented as a Python/C++ package: https://github.com/jaak-s/macau. It is also available as a Julia package: https://github.com/jaak-s/BayesianDataFusion.jl. All data and benchmarks generated or analyzed during this study can be downloaded at https://owncloud.esat.kuleuven.be/index.php/s/UGb89WfkZwMYoTn. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Pooya Zakeri
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Jaak Simm
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Adam Arany
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Sarah ElShal
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| | - Yves Moreau
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, KU Leuven and imec, Kapeldreef Leuven, Belgium
| |
Collapse
|
75
|
Tan A, Huang H, Zhang P, Li S. Network-based cancer precision medicine: A new emerging paradigm. Cancer Lett 2019; 458:39-45. [DOI: 10.1016/j.canlet.2019.05.015] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2019] [Revised: 04/29/2019] [Accepted: 05/15/2019] [Indexed: 12/20/2022]
|
76
|
Hériché JK, Alexander S, Ellenberg J. Integrating Imaging and Omics: Computational Methods and Challenges. Annu Rev Biomed Data Sci 2019. [DOI: 10.1146/annurev-biodatasci-080917-013328] [Citation(s) in RCA: 24] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Fluorescence microscopy imaging has long been complementary to DNA sequencing- and mass spectrometry–based omics in biomedical research, but these approaches are now converging. On the one hand, omics methods are moving from in vitro methods that average across large cell populations to in situ molecular characterization tools with single-cell sensitivity. On the other hand, fluorescence microscopy imaging has moved from a morphological description of tissues and cells to quantitative molecular profiling with single-molecule resolution. Recent technological developments underpinned by computational methods have started to blur the lines between imaging and omics and have made their direct correlation and seamless integration an exciting possibility. As this trend continues rapidly, it will allow us to create comprehensive molecular profiles of living systems with spatial and temporal context and subcellular resolution. Key to achieving this ambitious goal will be novel computational methods and successfully dealing with the challenges of data integration and sharing as well as cloud-enabled big data analysis.
Collapse
Affiliation(s)
- Jean-Karim Hériché
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Stephanie Alexander
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| | - Jan Ellenberg
- Cell Biology and Biophysics Unit, European Molecular Biology Laboratory (EMBL), 69117 Heidelberg, Germany
| |
Collapse
|
77
|
Yue Z, Willey CD, Hjelmeland AB, Chen JY. BEERE: a web server for biomedical entity expansion, ranking and explorations. Nucleic Acids Res 2019; 47:W578-W586. [PMID: 31114876 PMCID: PMC6602520 DOI: 10.1093/nar/gkz428] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2019] [Revised: 05/04/2019] [Accepted: 05/20/2019] [Indexed: 12/02/2022] Open
Abstract
BEERE (Biomedical Entity Expansion, Ranking and Explorations) is a new web-based data analysis tool to help biomedical researchers characterize any input list of genes/proteins, biomedical terms or their combinations, i.e. 'biomedical entities', in the context of existing literature. Specifically, BEERE first aims to help users examine the credibility of known entity-to-entity associative or semantic relationships supported by database or literature references from the user input of a gene/term list. Then, it will help users uncover the relative importance of each entity-a gene or a term-within the user input by computing the ranking scores of all entities. At last, it will help users hypothesize new gene functions or genotype-phenotype associations by an interactive visual interface of constructed global entity relationship network. The output from BEERE includes: a list of the original entities matched with known relationships in databases; any expanded entities that may be generated from the analysis; the ranks and ranking scores reported with statistical significance for each entity; and an interactive graphical display of the gene or term network within data provenance annotations that link to external data sources. The web server is free and open to all users with no login requirement and can be accessed at http://discovery.informatics.uab.edu/beere/.
Collapse
Affiliation(s)
- Zongliang Yue
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Christopher D Willey
- Department of Radiation Oncology, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Anita B Hjelmeland
- Department of Cell, Developmental and Integrative Biology, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| | - Jake Y Chen
- Informatics Institute, School of Medicine, the University of Alabama at Birmingham, AL 35233, USA
| |
Collapse
|
78
|
Deelen P, van Dam S, Herkert JC, Karjalainen JM, Brugge H, Abbott KM, van Diemen CC, van der Zwaag PA, Gerkes EH, Zonneveld-Huijssoon E, Boer-Bergsma JJ, Folkertsma P, Gillett T, van der Velde KJ, Kanninga R, van den Akker PC, Jan SZ, Hoorntje ET, Te Rijdt WP, Vos YJ, Jongbloed JDH, van Ravenswaaij-Arts CMA, Sinke R, Sikkema-Raddatz B, Kerstjens-Frederikse WS, Swertz MA, Franke L. Improving the diagnostic yield of exome- sequencing by predicting gene-phenotype associations using large-scale gene expression analysis. Nat Commun 2019; 10:2837. [PMID: 31253775 PMCID: PMC6599066 DOI: 10.1038/s41467-019-10649-4] [Citation(s) in RCA: 74] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2018] [Accepted: 05/23/2019] [Indexed: 02/06/2023] Open
Abstract
The diagnostic yield of exome and genome sequencing remains low (8-70%), due to incomplete knowledge on the genes that cause disease. To improve this, we use RNA-seq data from 31,499 samples to predict which genes cause specific disease phenotypes, and develop GeneNetwork Assisted Diagnostic Optimization (GADO). We show that this unbiased method, which does not rely upon specific knowledge on individual genes, is effective in both identifying previously unknown disease gene associations, and flagging genes that have previously been incorrectly implicated in disease. GADO can be run on www.genenetwork.nl by supplying HPO-terms and a list of genes that contain candidate variants. Finally, applying GADO to a cohort of 61 patients for whom exome-sequencing analysis had not resulted in a genetic diagnosis, yields likely causative genes for ten cases.
Collapse
Affiliation(s)
- Patrick Deelen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Sipko van Dam
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Johanna C Herkert
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Juha M Karjalainen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Harm Brugge
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Kristin M Abbott
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Cleo C van Diemen
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Paul A van der Zwaag
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Erica H Gerkes
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Evelien Zonneveld-Huijssoon
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Jelkje J Boer-Bergsma
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Pytrik Folkertsma
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Tessa Gillett
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - K Joeri van der Velde
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Roan Kanninga
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Peter C van den Akker
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Sabrina Z Jan
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Edgar T Hoorntje
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,Netherlands Heart Institute, 3511 EP, Utrecht, The Netherlands
| | - Wouter P Te Rijdt
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,Netherlands Heart Institute, 3511 EP, Utrecht, The Netherlands
| | - Yvonne J Vos
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Jan D H Jongbloed
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Conny M A van Ravenswaaij-Arts
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Richard Sinke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | - Birgit Sikkema-Raddatz
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands
| | | | - Morris A Swertz
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.,University of Groningen, University Medical Center Groningen, Genomics Coordination Center, 9700 VB, Groningen, The Netherlands
| | - Lude Franke
- University of Groningen, University Medical Center Groningen, Department of Genetics, 9700 VB, Groningen, The Netherlands.
| |
Collapse
|
79
|
Bern M, King A, Applewhite DA, Ritz A. Network-based prediction of polygenic disease genes involved in cell motility. BMC Bioinformatics 2019; 20:313. [PMID: 31216978 PMCID: PMC6584515 DOI: 10.1186/s12859-019-2834-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/11/2023] Open
Abstract
Background Schizophrenia and autism are examples of polygenic diseases caused by a multitude of genetic variants, many of which are still poorly understood. Recently, both diseases have been associated with disrupted neuron motility and migration patterns, suggesting that aberrant cell motility is a phenotype for these neurological diseases. Results We formulate the Polygenic Disease Phenotype Problem which seeks to identify candidate disease genes that may be associated with a phenotype such as cell motility. We present a machine learning approach to solve this problem for schizophrenia and autism genes within a brain-specific functional interaction network. Our method outperforms peer semi-supervised learning approaches, achieving better cross-validation accuracy across different sets of gold-standard positives. We identify top candidates for both schizophrenia and autism, and select six genes labeled as schizophrenia positives that are predicted to be associated with cell motility for follow-up experiments. Conclusions Candidate genes predicted by our method suggest testable hypotheses about these genes’ role in cell motility regulation, offering a framework for generating predictions for experimental validation. Electronic supplementary material The online version of this article (10.1186/s12859-019-2834-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Miriam Bern
- Biology Department, Reed College, Portland, OR, USA
| | | | | | - Anna Ritz
- Biology Department, Reed College, Portland, OR, USA.
| |
Collapse
|
80
|
Fine RS, Pers TH, Amariuta T, Raychaudhuri S, Hirschhorn JN. Benchmarker: An Unbiased, Association-Data-Driven Strategy to Evaluate Gene Prioritization Algorithms. Am J Hum Genet 2019; 104:1025-1039. [PMID: 31056107 PMCID: PMC6556976 DOI: 10.1016/j.ajhg.2019.03.027] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2018] [Accepted: 03/28/2019] [Indexed: 01/17/2023] Open
Abstract
Genome-wide association studies (GWASs) are valuable for understanding human biology, but associated loci typically contain multiple associated variants and genes. Thus, algorithms that prioritize likely causal genes and variants for a given phenotype can provide biological interpretations of association data. However, a critical, currently missing capability is to objectively compare performance of such algorithms. Typical comparisons rely on "gold standard" genes harboring causal coding variants, but such gold standards may be biased and incomplete. To address this issue, we developed Benchmarker, an unbiased, data-driven benchmarking method that compares performance of similarity-based prioritization strategies to each other (and to random chance) by leave-one-chromosome-out cross-validation with stratified linkage disequilibrium (LD) score regression. We first applied Benchmarker to 20 well-powered GWASs and compared gene prioritization based on strategies employing three different data sources, including annotated gene sets and gene expression; genes prioritized based on gene sets had higher per-SNP heritability than those prioritized based on gene expression. Additionally, in a direct comparison of three methods, DEPICT and MAGMA outperformed NetWAS. We also evaluated combinations of methods; our results indicated that combining data sources and algorithms can help prioritize higher-quality genes for follow-up. Benchmarker provides an unbiased approach to evaluate any similarity-based method that provides genome-wide prioritization of genes, variants, or gene sets and can determine the best such method for any particular GWAS. Our method addresses an important unmet need for rigorous tool assessment and can assist in mapping genetic associations to causal function.
Collapse
Affiliation(s)
- Rebecca S Fine
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Ph.D. Program in Biological and Biomedical Sciences, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Tune H Pers
- The Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark; Department of Epidemiology Research, Statens Serum Institut, 2300 Copenhagen, Denmark
| | - Tiffany Amariuta
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Ph.D. Program in Bioinformatics and Integrative Genomics, Graduate School of Arts and Sciences, Harvard University, Cambridge, MA 02138, USA
| | - Soumya Raychaudhuri
- Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Center for Data Sciences, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Division of Genetics, Brigham and Women's Hospital, Boston, MA 02115, USA; Division of Rheumatology, Immunology, and Allergy, Brigham and Women's Hospital and Harvard Medical School, Boston, MA 02115, USA; Department of Biomedical Informatics, Harvard Medical School, Boston, MA 02115, USA; Arthritis Research UK Centre for Genetics and Genomics, Centre for Musculoskeletal Research, Manchester Academic Health Science Centre, The University of Manchester, Manchester M13 9PL, UK
| | - Joel N Hirschhorn
- Department of Genetics, Harvard Medical School, Boston, MA 02115, USA; Division of Endocrinology and Center for Basic and Translational Obesity Research, Boston Children's Hospital, Boston, MA 02115, USA; Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Pediatrics, Harvard Medical School, Boston, MA 02115, USA.
| |
Collapse
|
81
|
Dozmorov MG. Disease classification: from phenotypic similarity to integrative genomics and beyond. Brief Bioinform 2019; 20:1769-1780. [DOI: 10.1093/bib/bby049] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/01/2018] [Indexed: 02/06/2023] Open
Abstract
Abstract
A fundamental challenge of modern biomedical research is understanding how diseases that are similar on the phenotypic level are similar on the molecular level. Integration of various genomic data sets with the traditionally used phenotypic disease similarity revealed novel genetic and molecular mechanisms and blurred the distinction between monogenic (Mendelian) and complex diseases. Network-based medicine has emerged as a complementary approach for identifying disease-causing genes, genetic mediators, disruptions in the underlying cellular functions and for drug repositioning. The recent development of machine and deep learning methods allow for leveraging real-life information about diseases to refine genetic and phenotypic disease relationships. This review describes the historical development and recent methodological advancements for studying disease classification (nosology).
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, 830 East Main Street, Richmond, VA, USA
| |
Collapse
|
82
|
Sonawane AR, Weiss ST, Glass K, Sharma A. Network Medicine in the Age of Biomedical Big Data. Front Genet 2019; 10:294. [PMID: 31031797 PMCID: PMC6470635 DOI: 10.3389/fgene.2019.00294] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 03/19/2019] [Indexed: 12/13/2022] Open
Abstract
Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.
Collapse
Affiliation(s)
- Abhijeet R. Sonawane
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Amitabh Sharma
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women’s Hospital, Boston, MA, United States
| |
Collapse
|
83
|
Almasi SM, Hu T. Measuring the importance of vertices in the weighted human disease network. PLoS One 2019; 14:e0205936. [PMID: 30901770 PMCID: PMC6430629 DOI: 10.1371/journal.pone.0205936] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2018] [Accepted: 02/26/2019] [Indexed: 12/11/2022] Open
Abstract
Many human genetic disorders and diseases are known to be related to each other through frequently observed co-occurrences. Studying the correlations among multiple diseases provides an important avenue to better understand the common genetic background of diseases and to help develop new drugs that can treat multiple diseases. Meanwhile, network science has seen increasing applications on modeling complex biological systems, and can be a powerful tool to elucidate the correlations of multiple human diseases. In this article, known disease-gene associations were represented using a weighted bipartite network. We extracted a weighted human diseases network from such a bipartite network to show the correlations of diseases. Subsequently, we proposed a new centrality measurement for the weighted human disease network (WHDN) in order to quantify the importance of diseases. Using our centrality measurement to quantify the importance of vertices in WHDN, we were able to find a set of most central diseases. By investigating the 30 top diseases and their most correlated neighbors in the network, we identified disease linkages including known disease pairs and novel findings. Our research helps better understand the common genetic origin of human diseases and suggests top diseases that likely induce other related diseases.
Collapse
Affiliation(s)
| | - Ting Hu
- Department of Computer Science, Memorial University, St. John’s, NL, Canada
| |
Collapse
|
84
|
Saik OV, Nimaev VV, Usmonov DB, Demenkov PS, Ivanisenko TV, Lavrik IN, Ivanisenko VA. Prioritization of genes involved in endothelial cell apoptosis by their implication in lymphedema using an analysis of associative gene networks with ANDSystem. BMC Med Genomics 2019; 12:47. [PMID: 30871556 PMCID: PMC6417156 DOI: 10.1186/s12920-019-0492-9] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023] Open
Abstract
BACKGROUND Currently, more than 150 million people worldwide suffer from lymphedema. It is a chronic progressive disease characterized by high-protein edema of various parts of the body due to defects in lymphatic drainage. Molecular-genetic mechanisms of the disease are still poorly understood. Beginning of a clinical manifestation of primary lymphedema in middle age and the development of secondary lymphedema after treatment of breast cancer can be genetically determined. Disruption of endothelial cell apoptosis can be considered as one of the factors contributing to the development of lymphedema. However, a study of the relationship between genes associated with lymphedema and genes involved in endothelial apoptosis, in the associative gene network was not previously conducted. METHODS In the current work, we used well-known methods (ToppGene and Endeavour), as well as methods previously developed by us, to prioritize genes involved in endothelial apoptosis and to find potential participants of molecular-genetic mechanisms of lymphedema among them. Original methods of prioritization took into account the overrepresented Gene Ontology biological processes, the centrality of vertices in the associative gene network, describing the interactions of endothelial apoptosis genes with genes associated with lymphedema, and the association of the analyzed genes with diseases that are comorbid to lymphedema. RESULTS An assessment of the quality of prioritization was performed using criteria, which involved an analysis of the enrichment of the top-most priority genes by genes, which are known to have simultaneous interactions with lymphedema and endothelial cell apoptosis, as well as by genes differentially expressed in murine model of lymphedema. In particular, among genes involved in endothelial apoptosis, KDR, TNF, TEK, BMPR2, SERPINE1, IL10, CD40LG, CCL2, FASLG and ABL1 had the highest priority. The identified priority genes can be considered as candidates for genotyping in the studies involving the search for associations with lymphedema. CONCLUSIONS Analysis of interactions of these genes in the associative gene network of lymphedema can improve understanding of mechanisms of interaction between endothelial apoptosis and lymphangiogenesis, and shed light on the role of disturbance of these processes in the development of edema, chronic inflammation and connective tissue transformation during the progression of the disease.
Collapse
Affiliation(s)
- Olga V. Saik
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Vadim V. Nimaev
- Laboratory of Surgical Lymphology and Lymphodetoxication, Research Institute of Clinical and Experimental Lymрhology – Branch of the Institute of Cytology and Genetics, Siberian Branch of Russian Academy of Sciences, st. Timakova 2, Novosibirsk, 630117 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Dilovarkhuja B. Usmonov
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
- Department of Neurosurgery, Ya. L. Tsivyan Novosibirsk Research Institute of Traumatology and Orthopedics, Ministry of Health of the Russian Federation, st. Frunze 17, Novosibirsk, 630091 Russia
| | - Pavel S. Demenkov
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Timofey V. Ivanisenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| | - Inna N. Lavrik
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Translational Inflammation Research, Institute of Experimental Internal Medicine, Otto von Guericke University Magdeburg, Medical Faculty, Pfalzer Platz 28, 39106 Magdeburg, Germany
| | - Vladimir A. Ivanisenko
- Laboratory of Computer-Assisted Proteomics, Institute of Cytology and Genetics, Siberian Branch, Russian Academy of Sciences, Prospekt Lavrentyeva 10, Novosibirsk, 630090 Russia
- Novosibirsk State University, st. Pirogova 1, Novosibirsk, 630090 Russia
| |
Collapse
|
85
|
Yao J, Hurle MR, Nelson MR, Agarwal P. Predicting clinically promising therapeutic hypotheses using tensor factorization. BMC Bioinformatics 2019; 20:69. [PMID: 30736745 PMCID: PMC6368709 DOI: 10.1186/s12859-019-2664-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2018] [Accepted: 01/30/2019] [Indexed: 12/19/2022] Open
Abstract
BACKGROUND Determining which target to pursue is a challenging and error-prone first step in developing a therapeutic treatment for a disease, where missteps are potentially very costly given the long-time frames and high expenses of drug development. With current informatics technology and machine learning algorithms, it is now possible to computationally discover therapeutic hypotheses by predicting clinically promising drug targets based on the evidence associating drug targets with disease indications. We have collected this evidence from Open Targets and additional databases that covers 17 sources of evidence for target-indication association and represented the data as a tensor of 21,437 × 2211 × 17. RESULTS As a proof-of-concept, we identified examples of successes and failures of target-indication pairs in clinical trials across 875 targets and 574 disease indications to build a gold-standard data set of 6140 known clinical outcomes. We designed and executed three benchmarking strategies to examine the performance of multiple machine learning models: Logistic Regression, LASSO, Random Forest, Tensor Factorization and Gradient Boosting Machine. With 10-fold cross-validation, tensor factorization achieved AUROC = 0.82 ± 0.02 and AUPRC = 0.71 ± 0.03. Across multiple validation schemes, this was comparable or better than other methods. CONCLUSION In this work, we benchmarked a machine learning technique called tensor factorization for the problem of predicting clinical outcomes of therapeutic hypotheses. Results have shown that this method can achieve equal or better prediction performance compared with a variety of baseline models. We demonstrate one application of the method to predict outcomes of trials on novel indications of approved drug targets. This work can be expanded to targets and indications that have never been clinically tested and proposing novel target-indication hypotheses. Our proposed biologically-motivated cross-validation schemes provide insight into the robustness of the prediction performance. This has significant implications for all future methods that try to address this seminal problem in drug discovery.
Collapse
Affiliation(s)
- Jin Yao
- Computational Biology, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| | - Mark R. Hurle
- Computational Biology, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| | - Matthew R. Nelson
- Genetics, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| | - Pankaj Agarwal
- Computational Biology, GSK R&D, 1250 S. Collegeville Road, UP12-200, Collegeville, PA USA
| |
Collapse
|
86
|
Combined haplotype blocks regression and multi-locus mixed model analysis reveals novel candidate genes associated with milk traits in dairy sheep. Livest Sci 2019. [DOI: 10.1016/j.livsci.2018.11.020] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022]
|
87
|
Abstract
Tumor genomic profiling involves analyzing many data types to produce a molecular profile of a tumor. Many of these analyses result in a prioritized list of genes or variants for further study. Interpretation of these lists relies upon annotating and extracting biological meaning through literature and manually curated knowledge bases. This chapter will describe several of these approaches including gene annotation, variant annotation, clinical annotation, functional enrichment analyses, and network analyses. Taken together or individually, these analyses will result in a biological understanding of complex genomic data to improve clinical decision making.
Collapse
Affiliation(s)
- Kathleen M Fisch
- Department of Medicine, Center for Computational Biology and Bioinformatics, University of California San Diego, La Jolla, CA, USA.
| |
Collapse
|
88
|
Navarro C, Martínez V, Blanco A, Cano C. ProphTools: general prioritization tools for heterogeneous biological networks. Gigascience 2018; 6:1-8. [PMID: 29186475 PMCID: PMC5751048 DOI: 10.1093/gigascience/gix111] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2017] [Accepted: 11/09/2017] [Indexed: 12/17/2022] Open
Abstract
Background Networks have been proven effective representations for the analysis of biological data. As such, there exist multiple methods to extract knowledge from biological networks. However, these approaches usually limit their scope to a single biological entity type of interest or they lack the flexibility to analyze user-defined data. Results We developed ProphTools, a flexible open-source command-line tool that performs prioritization on a heterogeneous network. ProphTools prioritization combines a Flow Propagation algorithm similar to a Random Walk with Restarts and a weighted propagation method. A flexible model for the representation of a heterogeneous network allows the user to define a prioritization problem involving an arbitrary number of entity types and their interconnections. Furthermore, ProphTools provides functionality to perform cross-validation tests, allowing users to select the best network configuration for a given problem. ProphTools core prioritization methodology has already been proven effective in gene-disease prioritization and drug repositioning. Here we make ProphTools available to the scientific community as flexible, open-source software and perform a new proof-of-concept case study on long noncoding RNAs (lncRNAs) to disease prioritization. Conclusions ProphTools is robust prioritization software that provides the flexibility not present in other state-of-the-art network analysis approaches, enabling researchers to perform prioritization tasks on any user-defined heterogeneous network. Furthermore, the application to lncRNA-disease prioritization shows that ProphTools can reach the performance levels of ad hoc prioritization tools without losing its generality.
Collapse
Affiliation(s)
- Carmen Navarro
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Victor Martínez
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Armando Blanco
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| | - Carlos Cano
- Department of Computer Science and Artificial Intelligence, University of Granada, Granada, Spain
| |
Collapse
|
89
|
Kim MH, Banerjee S, Zhao Y, Wang F, Zhang Y, Zhu Y, DeFerio J, Evans L, Park SM, Pathak J. Association networks in a matched case-control design - Co-occurrence patterns of preexisting chronic medical conditions in patients with major depression versus their matched controls. J Biomed Inform 2018; 87:88-95. [PMID: 30300713 PMCID: PMC6262847 DOI: 10.1016/j.jbi.2018.09.016] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2018] [Revised: 09/25/2018] [Accepted: 09/28/2018] [Indexed: 01/09/2023]
Abstract
OBJECTIVE We present a method for comparing association networks in a matched case-control design, which provides a high-level comparison of co-occurrence patterns of features after adjusting for confounding factors. We demonstrate this approach by examining the differential distribution of chronic medical conditions in patients with major depressive disorder (MDD) compared to the distribution of these conditions in their matched controls. MATERIALS AND METHODS Newly diagnosed MDD patients were matched to controls based on their demographic characteristics, socioeconomic status, place of residence, and healthcare service utilization in the Korean National Health Insurance Service's National Sample Cohort. Differences in the networks of chronic medical conditions in newly diagnosed MDD cases treated with antidepressants, and their matched controls, were prioritized with a permutation test accounting for the false discovery rate. Sensitivity analyses for the associations between prioritized pairs of chronic medical conditions and new MDD diagnosis were performed with regression modeling. RESULTS By comparing the association networks of chronic medical conditions in newly diagnosed depression patients and their matched controls, five pairs of such conditions were prioritized among 105 possible pairs after controlling the false discovery rate at 5%. In sensitivity analyses using regression modeling, four out of the five prioritized pairs were statistically significant for the interaction terms. CONCLUSION Association networks in a matched case-control design can provide a high-level comparison of comorbid features after adjusting for confounding factors, thereby supplementing traditional clinical study approaches. We demonstrate the differential co-occurrence pattern of chronic medical conditions in patients with MDD and prioritize the chronic conditions that have statistically significant interactions in regression models for depression.
Collapse
Affiliation(s)
- Min-Hyung Kim
- Division of Health Informatics, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Samprit Banerjee
- Division of Biostatistics and Epidemiology, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Yize Zhao
- Division of Biostatistics and Epidemiology, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Fei Wang
- Division of Health Informatics, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Yiye Zhang
- Division of Health Informatics, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Yongjun Zhu
- Department of Library and Information Science, Sungkyungkwan University, Seoul, Republic of Korea
| | - Joseph DeFerio
- Division of Health Informatics, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Lauren Evans
- Division of Biostatistics and Epidemiology, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA
| | - Sang Min Park
- Department of Family Medicine, Seoul National University Hospital, Seoul, Republic of Korea; Department of Biomedical Sciences, Seoul National University College of Medicine, Seoul, Republic of Korea.
| | - Jyotishman Pathak
- Division of Health Informatics, Department of Health Policy and Research, Weill Cornell Medical College of Cornell University, NY, USA.
| |
Collapse
|
90
|
How Surrogate and Chemical Genetics in Model Organisms Can Suggest Therapies for Human Genetic Diseases. Genetics 2018; 208:833-851. [PMID: 29487144 PMCID: PMC5844338 DOI: 10.1534/genetics.117.300124] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/09/2017] [Accepted: 12/26/2017] [Indexed: 12/12/2022] Open
Abstract
Genetic diseases are both inherited and acquired. Many genetic diseases fall under the paradigm of orphan diseases, a disease found in < 1 in 2000 persons. With rapid and cost-effective genome sequencing becoming the norm, many causal mutations for genetic diseases are being rapidly determined. In this regard, model organisms are playing an important role in validating if specific mutations identified in patients drive the observed phenotype. An emerging challenge for model organism researchers is the application of genetic and chemical genetic platforms to discover drug targets and drugs/drug-like molecules for potential treatment options for patients with genetic disease. This review provides an overview of how model organisms have contributed to our understanding of genetic disease, with a focus on the roles of yeast and zebrafish in gene discovery and the identification of compounds that could potentially treat human genetic diseases.
Collapse
|
91
|
Sharma A, Kitsak M, Cho MH, Ameli A, Zhou X, Jiang Z, Crapo JD, Beaty TH, Menche J, Bakke PS, Santolini M, Silverman EK. Integration of Molecular Interactome and Targeted Interaction Analysis to Identify a COPD Disease Network Module. Sci Rep 2018; 8:14439. [PMID: 30262855 PMCID: PMC6160419 DOI: 10.1038/s41598-018-32173-z] [Citation(s) in RCA: 25] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2017] [Accepted: 08/20/2018] [Indexed: 12/21/2022] Open
Abstract
The polygenic nature of complex diseases offers potential opportunities to utilize network-based approaches that leverage the comprehensive set of protein-protein interactions (the human interactome) to identify new genes of interest and relevant biological pathways. However, the incompleteness of the current human interactome prevents it from reaching its full potential to extract network-based knowledge from gene discovery efforts, such as genome-wide association studies, for complex diseases like chronic obstructive pulmonary disease (COPD). Here, we provide a framework that integrates the existing human interactome information with experimental protein-protein interaction data for FAM13A, one of the most highly associated genetic loci to COPD, to find a more comprehensive disease network module. We identified an initial disease network neighborhood by applying a random-walk method. Next, we developed a network-based closeness approach (CAB) that revealed 9 out of 96 FAM13A interacting partners identified by affinity purification assays were significantly close to the initial network neighborhood. Moreover, compared to a similar method (local radiality), the CAB approach predicts low-degree genes as potential candidates. The candidates identified by the network-based closeness approach were combined with the initial network neighborhood to build a comprehensive disease network module (163 genes) that was enriched with genes differentially expressed between controls and COPD subjects in alveolar macrophages, lung tissue, sputum, blood, and bronchial brushing datasets. Overall, we demonstrate an approach to find disease-related network components using new laboratory data to overcome incompleteness of the current interactome.
Collapse
Affiliation(s)
- Amitabh Sharma
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA. .,Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA. .,Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, MA, 02115, USA. .,Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA.
| | - Maksim Kitsak
- Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, MA, 02115, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA.,Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, USA.,Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Asher Ameli
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA.,Department of Physics, Northeastern University, Boston, MA, 02115, United States
| | - Xiaobo Zhou
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA.,Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA
| | - Zhiqiang Jiang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA
| | - James D Crapo
- Department of Medicine, National Jewish Health, Denver, Colorado, USA
| | - Terri H Beaty
- Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Jörg Menche
- Department of Bioinformatics, CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, A-1090, Vienna, Austria
| | - Per S Bakke
- Department of Clinical Science, University of Bergen, Bergen, Norway
| | - Marc Santolini
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA.,Center for Complex Networks Research and Department of Physics, Northeastern University, Boston, MA, 02115, USA.,Center for Cancer Systems Biology, Dana-Farber Cancer Institute, Boston, MA, 02115, USA
| | - Edwin K Silverman
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital, Boston, USA. .,Pulmonary and Critical Care Division, Brigham and Women's Hospital and Harvard Medical School, Boston, USA. .,Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02115, USA.
| |
Collapse
|
92
|
Inferring microRNA-Environmental Factor Interactions Based on Multiple Biological Information Fusion. Molecules 2018; 23:molecules23102439. [PMID: 30249984 PMCID: PMC6222788 DOI: 10.3390/molecules23102439] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Revised: 09/14/2018] [Accepted: 09/18/2018] [Indexed: 12/11/2022] Open
Abstract
Accumulated studies have shown that environmental factors (EFs) can regulate the expression of microRNA (miRNA) which is closely associated with several diseases. Therefore, identifying miRNA-EF associations can facilitate the study of diseases. Recently, several computational methods have been proposed to explore miRNA-EF interactions. In this paper, a novel computational method, MEI-BRWMLL, is proposed to uncover the relationship between miRNA and EF. The similarities of miRNA-miRNA are calculated by using miRNA sequence, miRNA-EF interaction, and the similarities of EF-EF are calculated based on the anatomical therapeutic chemical information, chemical structure and miRNA-EF interaction. The similarity network fusion is used to fuse the similarity between miRNA and the similarity between EF, respectively. Further, the multiple-label learning and bi-random walk are employed to identify the association between miRNA and EF. The experimental results show that our method outperforms the state-of-the-art algorithms.
Collapse
|
93
|
Moreno-Ramírez CE, Gutiérrez-Garzón E, Barreto GE, Forero DA. Genome-Wide Expression Profiles for Ischemic Stroke: A Meta-Analysis. J Stroke Cerebrovasc Dis 2018; 27:3336-3341. [PMID: 30166211 DOI: 10.1016/j.jstrokecerebrovasdis.2018.07.035] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2017] [Revised: 07/07/2018] [Accepted: 07/22/2018] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Genome-wide expression studies (GWES), using microarray platforms, have allowed a deeper understanding of the molecular factors involved in the pathophysiology of ischemic stroke (IS), one of the main global causes of mortality and disability. METHODS In the current work, we carried out a meta-analysis of available GWES for IS. Bioinformatics and computational biology analyses were applied to identify enriched functional categories and convergence with other genomic datasets for IS. RESULTS Three primary datasets were included and in the meta-analyses for GWES and IS, 41 differentially expressed (DE) genes were identified using a random effects model. Thirteen of these genes were downregulated and 28 were upregulated. An analysis of functional categories found a significant enrichment for the Gene Ontology Term "Inflammatory Response" and for binding sites for the PAX2 transcription factor. CONCLUSIONS The list of DE genes identified in this meta-analysis of GWES for IS is useful for future genetic and molecular studies, which would allow the identification of novel mechanisms involved in the pathophysiology of IS. Several of the DE genes found in this meta-analysis have known functional roles related to mechanisms involved in the pathophysiology of IS. It is recognized the role of the inflammatory response in the pathophysiology of IS.
Collapse
Affiliation(s)
- Carlos E Moreno-Ramírez
- Laboratory of Neuropsychiatric Genetics, Biomedical Sciences Research Group, School of Medicine, Universidad Antonio Nariño, Bogotá, Colombia
| | - Eulogia Gutiérrez-Garzón
- Laboratory of Neuropsychiatric Genetics, Biomedical Sciences Research Group, School of Medicine, Universidad Antonio Nariño, Bogotá, Colombia
| | - George E Barreto
- Departamento de Nutrición y Bioquímica, Facultad de Ciencias, Pontificia Universidad Javeriana, Bogotá, Colombia
| | - Diego A Forero
- Laboratory of Neuropsychiatric Genetics, Biomedical Sciences Research Group, School of Medicine, Universidad Antonio Nariño, Bogotá, Colombia.
| |
Collapse
|
94
|
Liu W, Liu J, Rajapakse JC. Gene Ontology Enrichment Improves Performances of Functional Similarity of Genes. Sci Rep 2018; 8:12100. [PMID: 30108262 PMCID: PMC6092333 DOI: 10.1038/s41598-018-30455-0] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/25/2018] [Indexed: 12/23/2022] Open
Abstract
There exists a plethora of measures to evaluate functional similarity (FS) between genes, which is a widely used in many bioinformatics applications including detecting molecular pathways, identifying co-expressed genes, predicting protein-protein interactions, and prioritization of disease genes. Measures of FS between genes are mostly derived from Information Contents (IC) of Gene Ontology (GO) terms annotating the genes. However, existing measures evaluating IC of terms based either on the representations of terms in the annotating corpus or on the knowledge embedded in the GO hierarchy do not consider the enrichment of GO terms by the querying pair of genes. The enrichment of a GO term by a pair of gene is dependent on whether the term is annotated by one gene (i.e., partial annotation) or by both genes (i.e. complete annotation) in the pair. In this paper, we propose a method that incorporate enrichment of GO terms by a gene pair in computing their FS and show that GO enrichment improves the performances of 46 existing FS measures in the prediction of sequence homologies, gene expression correlations, protein-protein interactions, and disease associated genes.
Collapse
Affiliation(s)
- Wenting Liu
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
| | - Jianjun Liu
- Human Genetics, Genome Institute of Singapore, Singapore, Singapore.
| | - Jagath C Rajapakse
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore.
| |
Collapse
|
95
|
Doğan T. HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences. PeerJ 2018; 6:e5298. [PMID: 30083448 PMCID: PMC6076985 DOI: 10.7717/peerj.5298] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/04/2018] [Accepted: 07/03/2018] [Indexed: 01/24/2023] Open
Abstract
Analysing the relationships between biomolecules and the genetic diseases is a highly active area of research, where the aim is to identify the genes and their products that cause a particular disease due to functional changes originated from mutations. Biological ontologies are frequently employed in these studies, which provides researchers with extensive opportunities for knowledge discovery through computational data analysis. In this study, a novel approach is proposed for the identification of relationships between biomedical entities by automatically mapping phenotypic abnormality defining HPO terms with biomolecular function defining GO terms, where each association indicates the occurrence of the abnormality due to the loss of the biomolecular function expressed by the corresponding GO term. The proposed HPO2GO mappings were extracted by calculating the frequency of the co-annotations of the terms on the same genes/proteins, using already existing curated HPO and GO annotation sets. This was followed by the filtering of the unreliable mappings that could be observed due to chance, by statistical resampling of the co-occurrence similarity distributions. Furthermore, the biological relevance of the finalized mappings were discussed over selected cases, using the literature. The resulting HPO2GO mappings can be employed in different settings to predict and to analyse novel gene/protein—ontology term—disease relations. As an application of the proposed approach, HPO term—protein associations (i.e., HPO2protein) were predicted. In order to test the predictive performance of the method on a quantitative basis, and to compare it with the state-of-the-art, CAFA2 challenge HPO prediction target protein set was employed. The results of the benchmark indicated the potential of the proposed approach, as HPO2GO performance was among the best (Fmax = 0.35). The automated cross ontology mapping approach developed in this work may be extended to other ontologies as well, to identify unexplored relation patterns at the systemic level. The datasets, results and the source code of HPO2GO are available for download at: https://github.com/cansyl/HPO2GO.
Collapse
Affiliation(s)
- Tunca Doğan
- Department of Health Informatics, Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,Cancer Systems Biology Laboratory (KanSiL), Graduate School of Informatics, Middle East Technical University, Ankara, Turkey.,European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Hinxton, Cambridge, UK
| |
Collapse
|
96
|
Watford SM, Grashow RG, De La Rosa VY, Rudel RA, Friedman KP, Martin MT. Novel application of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene sets associated with disease: use case in breast carcinogenesis. COMPUTATIONAL TOXICOLOGY (AMSTERDAM, NETHERLANDS) 2018; 7:46-57. [PMID: 32274464 PMCID: PMC7144681 DOI: 10.1016/j.comtox.2018.06.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]
Abstract
Advances in technology within biomedical sciences have led to an inundation of data across many fields, raising new challenges in how best to integrate and analyze these resources. For example, rapid chemical screening programs like the US Environmental Protection Agency's ToxCast and the collaborative effort, Tox21, have produced massive amounts of information on putative chemical mechanisms where assay targets are identified as genes; however, systematically linking these hypothesized mechanisms with in vivo toxicity endpoints like disease outcomes remains problematic. Herein we present a novel use of normalized pointwise mutual information (NPMI) to mine biomedical literature for gene associations with biological concepts as represented by Medical Subject Headings (MeSH terms) in PubMed. Resources that tag genes to articles were integrated, then cross-species orthologs were identified using UniRef50 clusters. MeSH term frequency was normalized to reflect the MeSH tree structure, and then the resulting GeneID-MeSH associations were ranked using NPMI. The resulting network, called Entity MeSH Co-occurrence Network (EMCON), is a scalable resource for the identification and ranking of genes for a given topic of interest. The utility of EMCON was evaluated with the use case of breast carcinogenesis. Topics relevant to breast carcinogenesis were used to query EMCON and retrieve genes important to each topic. A breast cancer gene set was compiled through expert literature review (ELR) to assess performance of the search results. We found that the results from EMCON ranked the breast cancer genes from ELR higher than randomly selected genes with a recall of 0.98. Precision of the top five genes for selected topics was calculated as 0.87. This work demonstrates that EMCON can be used to link in vitro results to possible biological outcomes, thus aiding in generation of testable hypotheses for furthering understanding of biological function and the contribution of chemical exposures to disease.
Collapse
Affiliation(s)
- Sean M Watford
- ORAU, contractor to U.S. Environmental Protection Agency through the National Student Services Contract, Oak Ridge, TN
- Department of Environmental Sciences and Engineering, Gillings School of Global Public Health, UNC-Chapel Hill, Chapel Hill, North Carolina, United States
| | - Rachel G Grashow
- Silent Spring Institute, Newton, MA
- Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA
| | - Vanessa Y De La Rosa
- Silent Spring Institute, Newton, MA
- Social Science Environmental Health Research Institute, Northeastern University, Boston, MA
| | | | | | - Matthew T Martin
- U.S. Environmental Protection Agency, National Center for Computational Toxicology, Research Triangle Park, NC, USA
- Currently at Pfizer Worldwide Research & Development, Groton, CT, USA
| |
Collapse
|
97
|
Diessler S, Jan M, Emmenegger Y, Guex N, Middleton B, Skene DJ, Ibberson M, Burdet F, Götz L, Pagni M, Sankar M, Liechti R, Hor CN, Xenarios I, Franken P. A systems genetics resource and analysis of sleep regulation in the mouse. PLoS Biol 2018; 16:e2005750. [PMID: 30091978 PMCID: PMC6085075 DOI: 10.1371/journal.pbio.2005750] [Citation(s) in RCA: 41] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/19/2018] [Accepted: 07/06/2018] [Indexed: 12/30/2022] Open
Abstract
Sleep is essential for optimal brain functioning and health, but the biological substrates through which sleep delivers these beneficial effects remain largely unknown. We used a systems genetics approach in the BXD genetic reference population (GRP) of mice and assembled a comprehensive experimental knowledge base comprising a deep "sleep-wake" phenome, central and peripheral transcriptomes, and plasma metabolome data, collected under undisturbed baseline conditions and after sleep deprivation (SD). We present analytical tools to interactively interrogate the database, visualize the molecular networks altered by sleep loss, and prioritize candidate genes. We found that a one-time, short disruption of sleep already extensively reshaped the systems genetics landscape by altering 60%-78% of the transcriptomes and the metabolome, with numerous genetic loci affecting the magnitude and direction of change. Systems genetics integrative analyses drawing on all levels of organization imply α-amino-3-hydroxy-5-methyl-4-isoxazolepropionic acid (AMPA) receptor trafficking and fatty acid turnover as substrates of the negative effects of insufficient sleep. Our analyses demonstrate that genetic heterogeneity and the effects of insufficient sleep itself on the transcriptome and metabolome are far more widespread than previously reported.
Collapse
Affiliation(s)
- Shanaz Diessler
- Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Maxime Jan
- Center for Integrative Genomics, University of Lausanne, Switzerland
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Yann Emmenegger
- Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Nicolas Guex
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Benita Middleton
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, United Kingdom
| | - Debra J. Skene
- Faculty of Health and Medical Sciences, University of Surrey, Guildford, United Kingdom
| | - Mark Ibberson
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Frederic Burdet
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Lou Götz
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Marco Pagni
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Martial Sankar
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Robin Liechti
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Charlotte N. Hor
- Center for Integrative Genomics, University of Lausanne, Switzerland
| | - Ioannis Xenarios
- Center for Integrative Genomics, University of Lausanne, Switzerland
- Vital-IT Systems Biology Division, SIB Swiss Institute of Bioinformatics, Lausanne, Switzerland
| | - Paul Franken
- Center for Integrative Genomics, University of Lausanne, Switzerland
| |
Collapse
|
98
|
Kumar AA, Van Laer L, Alaerts M, Ardeshirdavani A, Moreau Y, Laukens K, Loeys B, Vandeweyer G. pBRIT: gene prioritization by correlating functional and phenotypic annotations through integrative data fusion. Bioinformatics 2018; 34:2254-2262. [PMID: 29452392 PMCID: PMC6022555 DOI: 10.1093/bioinformatics/bty079] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/16/2017] [Revised: 01/25/2018] [Accepted: 02/12/2018] [Indexed: 12/31/2022] Open
Abstract
Motivation Computational gene prioritization can aid in disease gene identification. Here, we propose pBRIT (prioritization using Bayesian Ridge regression and Information Theoretic model), a novel adaptive and scalable prioritization tool, integrating Pubmed abstracts, Gene Ontology, Sequence similarities, Mammalian and Human Phenotype Ontology, Pathway, Interactions, Disease Ontology, Gene Association database and Human Genome Epidemiology database, into the prediction model. We explore and address effects of sparsity and inter-feature dependencies within annotation sources, and the impact of bias towards specific annotations. Results pBRIT models feature dependencies and sparsity by an Information-Theoretic (data driven) approach and applies intermediate integration based data fusion. Following the hypothesis that genes underlying similar diseases will share functional and phenotype characteristics, it incorporates Bayesian Ridge regression to learn a linear mapping between functional and phenotype annotations. Genes are prioritized on phenotypic concordance to the training genes. We evaluated pBRIT against nine existing methods, and on over 2000 HPO-gene associations retrieved after construction of pBRIT data sources. We achieve maximum AUC scores ranging from 0.92 to 0.96 against benchmark datasets and of 0.80 against the time-stamped HPO entries, indicating good performance with high sensitivity and specificity. Our model shows stable performance with regard to changes in the underlying annotation data, is fast and scalable for implementation in routine pipelines. Availability and implementation http://biomina.be/apps/pbrit/; https://bitbucket.org/medgenua/pbrit. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ajay Anand Kumar
- Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium
- Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| | - Lut Van Laer
- Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium
| | - Maaike Alaerts
- Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium
| | - Amin Ardeshirdavani
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Belgium
- imec, Leuven, Belgium
| | - Yves Moreau
- Department of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing and Data Analytics, Belgium
- imec, Leuven, Belgium
| | - Kris Laukens
- Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
- ADReM Data Laboratory, University of Antwerp, Antwerp, Belgium
| | - Bart Loeys
- Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium
| | - Geert Vandeweyer
- Center of Medical Genetics, University of Antwerp and Antwerp University Hospital, Antwerp, Belgium
- Biomedical Informatics Research Network Antwerp (biomina), University of Antwerp, Antwerp, Belgium
| |
Collapse
|
99
|
Tran Van D, Sperduti A, Costa F. The conjunctive disjunctive graph node kernel for disease gene prioritization. Neurocomputing 2018. [DOI: 10.1016/j.neucom.2018.01.089] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
100
|
MGOGP: a gene module-based heuristic algorithm for cancer-related gene prioritization. BMC Bioinformatics 2018; 19:215. [PMID: 29871590 PMCID: PMC5989416 DOI: 10.1186/s12859-018-2216-0] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2017] [Accepted: 05/23/2018] [Indexed: 01/13/2023] Open
Abstract
Background Prioritizing genes according to their associations with a cancer allows researchers to explore genes in more informed ways. By far, Gene-centric or network-centric gene prioritization methods are predominated. Genes and their protein products carry out cellular processes in the context of functional modules. Dysfunctional gene modules have been previously reported to have associations with cancer. However, gene module information has seldom been considered in cancer-related gene prioritization. Results In this study, we propose a novel method, MGOGP (Module and Gene Ontology-based Gene Prioritization), for cancer-related gene prioritization. Different from other methods, MGOGP ranks genes considering information of both individual genes and their affiliated modules, and utilize Gene Ontology (GO) based fuzzy measure value as well as known cancer-related genes as heuristics. The performance of the proposed method is comprehensively validated by using both breast cancer and prostate cancer datasets, and by comparison with other methods. Results show that MGOGP outperforms other methods, and successfully prioritizes more genes with literature confirmed evidence. Conclusions This work will aid researchers in the understanding of the genetic architecture of complex diseases, and improve the accuracy of diagnosis and the effectiveness of therapy. Electronic supplementary material The online version of this article (10.1186/s12859-018-2216-0) contains supplementary material, which is available to authorized users.
Collapse
|