51
|
Moumbock AF, Li J, Mishra P, Gao M, Günther S. Current computational methods for predicting protein interactions of natural products. Comput Struct Biotechnol J 2019; 17:1367-1376. [PMID: 31762960 PMCID: PMC6861622 DOI: 10.1016/j.csbj.2019.08.008] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2019] [Revised: 08/09/2019] [Accepted: 08/23/2019] [Indexed: 01/08/2023] Open
Abstract
Natural products (NPs) are an indispensable source of drugs and they have a better coverage of the pharmacological space than synthetic compounds, owing to their high structural diversity. The prediction of their interaction profiles with druggable protein targets remains a major challenge in modern drug discovery. Experimental (off-)target predictions of NPs are cost- and time-consuming, whereas computational methods, on the other hand, are much faster and cheaper. As a result, computational predictions are preferentially used in the first instance for NP profiling, prior to experimental validations. This review covers recent advances in computational approaches which have been developed to aid the annotation of unknown drug-target interactions (DTIs), by focusing on three broad classes, namely: ligand-based, target-based, and target-ligand-based (hybrid) approaches. Computational DTI prediction methods have the potential to significantly advance the discovery and development of novel selective drugs exhibiting minimal side effects. We highlight some inherent caveats of these methods which must be overcome to enable them to realize their full potential, and a future outlook is given.
Collapse
Affiliation(s)
| | | | | | | | - Stefan Günther
- Institute of Pharmaceutical Sciences, Research Group Pharmaceutical Bioinformatics, Albert-Ludwigs-Universität Freiburg, Germany
| |
Collapse
|
52
|
Alshahrani M, Hoehndorf R. Semantic Disease Gene Embeddings (SmuDGE): phenotype-based disease gene prioritization without phenotypes. Bioinformatics 2019; 34:i901-i907. [PMID: 30423077 PMCID: PMC6129260 DOI: 10.1093/bioinformatics/bty559] [Citation(s) in RCA: 32] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Motivation In the past years, several methods have been developed to incorporate information about phenotypes into computational disease gene prioritization methods. These methods commonly compute the similarity between a disease’s (or patient’s) phenotypes and a database of gene-to-phenotype associations to find the phenotypically most similar match. A key limitation of these methods is their reliance on knowledge about phenotypes associated with particular genes which is highly incomplete in humans as well as in many model organisms such as the mouse. Results We developed SmuDGE, a method that uses feature learning to generate vector-based representations of phenotypes associated with an entity. SmuDGE can be used as a trainable semantic similarity measure to compare two sets of phenotypes (such as between a disease and gene, or a disease and patient). More importantly, SmuDGE can generate phenotype representations for entities that are only indirectly associated with phenotypes through an interaction network; for this purpose, SmuDGE exploits background knowledge in interaction networks comprised of multiple types of interactions. We demonstrate that SmuDGE can match or outperform semantic similarity in phenotype-based disease gene prioritization, and furthermore significantly extends the coverage of phenotype-based methods to all genes in a connected interaction network. Availability and implementation https://github.com/bio-ontology-research-group/SmuDGE
Collapse
Affiliation(s)
- Mona Alshahrani
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Robert Hoehndorf
- Computer, Electrical and Mathematical Sciences and Engineering Division, Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
53
|
Rao X, Dixon RA. Co-expression networks for plant biology: why and how. Acta Biochim Biophys Sin (Shanghai) 2019; 51:981-988. [PMID: 31436787 DOI: 10.1093/abbs/gmz080] [Citation(s) in RCA: 47] [Impact Index Per Article: 9.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/20/2019] [Accepted: 07/01/2019] [Indexed: 12/29/2022] Open
Abstract
Co-expression network analysis is one of the most powerful approaches for interpretation of large transcriptomic datasets. It enables characterization of modules of co-expressed genes that may share biological functional linkages. Such networks provide an initial way to explore functional associations from gene expression profiling and can be applied to various aspects of plant biology. This review presents the applications of co-expression network analysis in plant biology and addresses optimized strategies from the recent literature for performing co-expression analysis on plant biological systems. Additionally, we describe the combined interpretation of co-expression analysis with other genomic data to enhance the generation of biologically relevant information.
Collapse
Affiliation(s)
- Xiaolan Rao
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA
| | - Richard A Dixon
- BioDiscovery Institute and Department of Biological Sciences, University of North Texas, Denton, TX 76203, USA
| |
Collapse
|
54
|
Picart-Armada S, Barrett SJ, Willé DR, Perera-Lluna A, Gutteridge A, Dessailly BH. Benchmarking network propagation methods for disease gene identification. PLoS Comput Biol 2019; 15:e1007276. [PMID: 31479437 PMCID: PMC6743778 DOI: 10.1371/journal.pcbi.1007276] [Citation(s) in RCA: 23] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/24/2019] [Revised: 09/13/2019] [Accepted: 07/16/2019] [Indexed: 12/17/2022] Open
Abstract
In-silico identification of potential target genes for disease is an essential aspect of drug target discovery. Recent studies suggest that successful targets can be found through by leveraging genetic, genomic and protein interaction information. Here, we systematically tested the ability of 12 varied algorithms, based on network propagation, to identify genes that have been targeted by any drug, on gene-disease data from 22 common non-cancerous diseases in OpenTargets. We considered two biological networks, six performance metrics and compared two types of input gene-disease association scores. The impact of the design factors in performance was quantified through additive explanatory models. Standard cross-validation led to over-optimistic performance estimates due to the presence of protein complexes. In order to obtain realistic estimates, we introduced two novel protein complex-aware cross-validation schemes. When seeding biological networks with known drug targets, machine learning and diffusion-based methods found around 2-4 true targets within the top 20 suggestions. Seeding the networks with genes associated to disease by genetics decreased performance below 1 true hit on average. The use of a larger network, although noisier, improved overall performance. We conclude that diffusion-based prioritisers and machine learning applied to diffusion-based features are suited for drug discovery in practice and improve over simpler neighbour-voting methods. We also demonstrate the large impact of choosing an adequate validation strategy and the definition of seed disease genes. The use of biological network data has proven its effectiveness in many areas from computational biology. Networks consist of nodes, usually genes or proteins, and edges that connect pairs of nodes, representing information such as physical interactions, regulatory roles or co-occurrence. In order to find new candidate nodes for a given biological property, the so-called network propagation algorithms start from the set of known nodes with that property and leverage the connections from the biological network to make predictions. Here, we assess the performance of several network propagation algorithms to find sensible gene targets for 22 common non-cancerous diseases, i.e. those that have been found promising enough to start the clinical trials with any compound. We focus on obtaining performance metrics that reflect a practical scenario in drug development where only a small set of genes can be essayed. We found that the presence of protein complexes biased the performance estimates, leading to over-optimistic conclusions, and introduced two novel strategies to address it. Our results support that network propagation is still a viable approach to find drug targets, but that special care needs to be put on the validation strategy. Algorithms benefitted from the use of a larger -although noisier- network and of direct evidence data, rather than indirect genetic associations to disease.
Collapse
Affiliation(s)
- Sergio Picart-Armada
- B2SLab, Departament d’Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, Spain
- Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Esplugues de Llobregat, Spain
- * E-mail:
| | | | | | - Alexandre Perera-Lluna
- B2SLab, Departament d’Enginyeria de Sistemes, Automàtica i Informàtica Industrial, Universitat Politècnica de Catalunya, CIBER-BBN, Barcelona, Spain
- Networking Biomedical Research Centre in the subject area of Bioengineering, Biomaterials and Nanomedicine (CIBER-BBN), Madrid, Spain
- Institut de Recerca Pediàtrica Hospital Sant Joan de Déu, Esplugues de Llobregat, Spain
| | - Alex Gutteridge
- Computational Biology and Statistics, GSK, Stevenage, United Kingdom
| | | |
Collapse
|
55
|
McClure RS, Wendler JP, Adkins JN, Swanstrom J, Baric R, Kaiser BLD, Oxford KL, Waters KM, McDermott JE. Unified feature association networks through integration of transcriptomic and proteomic data. PLoS Comput Biol 2019; 15:e1007241. [PMID: 31527878 PMCID: PMC6748406 DOI: 10.1371/journal.pcbi.1007241] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2018] [Accepted: 07/02/2019] [Indexed: 11/18/2022] Open
Abstract
High-throughput multi-omics studies and corresponding network analyses of multi-omic data have rapidly expanded their impact over the last 10 years. As biological features of different types (e.g. transcripts, proteins, metabolites) interact within cellular systems, the greatest amount of knowledge can be gained from networks that incorporate multiple types of -omic data. However, biological and technical sources of variation diminish the ability to detect cross-type associations, yielding networks dominated by communities comprised of nodes of the same type. We describe here network building methods that can maximize edges between nodes of different data types leading to integrated networks, networks that have a large number of edges that link nodes of different-omic types (transcripts, proteins, lipids etc). We systematically rank several network inference methods and demonstrate that, in many cases, using a random forest method, GENIE3, produces the most integrated networks. This increase in integration does not come at the cost of accuracy as GENIE3 produces networks of approximately the same quality as the other network inference methods tested here. Using GENIE3, we also infer networks representing antibody-mediated Dengue virus cell invasion and receptor-mediated Dengue virus invasion. A number of functional pathways showed centrality differences between the two networks including genes responding to both GM-CSF and IL-4, which had a higher centrality value in an antibody-mediated vs. receptor-mediated Dengue network. Because a biological system involves the interplay of many different types of molecules, incorporating multiple data types into networks will improve their use as models of biological systems. The methods explored here are some of the first to specifically highlight and address the challenges associated with how such multi-omic networks can be assembled and how the greatest number of interactions can be inferred from different data types. The resulting networks can lead to the discovery of new host response patterns and interactions during viral infection, generate new hypotheses of pathogenic mechanisms and confirm mechanisms of disease.
Collapse
Affiliation(s)
- Ryan S. McClure
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Jason P. Wendler
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Joshua N. Adkins
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Jesica Swanstrom
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
| | - Ralph Baric
- Department of Microbiology and Immunology, School of Medicine, University of North Carolina, Chapel Hill, Chapel Hill, NC, United States of America
| | - Brooke L. Deatherage Kaiser
- Signatures Science and Technology Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Kristie L. Oxford
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Katrina M. Waters
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
| | - Jason E. McDermott
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland WA, United States of America
- Department of Molecular Microbiology and Immunology, Oregon Health & Sciences University, Portland, OR, United States of America
| |
Collapse
|
56
|
Lu S, Zhu ZG, Lu WC. Inferring novel genes related to colorectal cancer via random walk with restart algorithm. Gene Ther 2019; 26:373-385. [PMID: 31308477 DOI: 10.1038/s41434-019-0090-7] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/25/2018] [Revised: 05/20/2019] [Accepted: 06/11/2019] [Indexed: 12/12/2022]
Abstract
Colorectal cancer (CRC) is the third most common type of cancer. In recent decades, genomic analysis has played an increasingly important role in understanding the molecular mechanisms of CRC. However, its pathogenesis has not been fully uncovered. Identification of genes related to CRC as complete as possible is an important way to investigate its pathogenesis. Therefore, we proposed a new computational method for the identification of novel CRC-associated genes. The proposed method is based on existing proven CRC-associated genes, human protein-protein interaction networks, and random walk with restart algorithm. The utility of the method is indicated by comparing it to the methods based on Guilt-by-association or shortest path algorithm. Using the proposed method, we successfully identified 298 novel CRC-associated genes. Previous studies have validated the involvement of the majority of these 298 novel genes in CRC-associated biological processes, thus suggesting the efficacy and accuracy of our method.
Collapse
Affiliation(s)
- Sheng Lu
- Department of General Surgery, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Digestive Surgery, Shanghai, 200025, China
| | - Zheng-Gang Zhu
- Department of General Surgery, Rui Jin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai Institute of Digestive Surgery, Shanghai, 200025, China
| | - Wen-Cong Lu
- Department of Chemistry, College of Sciences, Shanghai University, Shanghai, 200444, China.
| |
Collapse
|
57
|
Omony J, de Jong A, Kok J, van Hijum SAFT. Reconstruction and inference of the Lactococcus lactis MG1363 gene co-expression network. PLoS One 2019; 14:e0214868. [PMID: 31116749 PMCID: PMC6530827 DOI: 10.1371/journal.pone.0214868] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2018] [Accepted: 03/21/2019] [Indexed: 01/30/2023] Open
Abstract
Lactic acid bacteria are Gram-positive bacteria used throughout the world in many industrial applications for their acidification, flavor and texture formation attributes. One of the species, Lactococcus lactis, is employed for the production of fermented milk products like cheese, buttermilk and quark. It ferments lactose to lactic acid and, thus, helps improve the shelf life of the products. Many physiological and transcriptome studies have been performed in L. lactis in order to comprehend and improve its biotechnological assets. Using large amounts of transcriptome data to understand and predict the behavior of biological processes in bacterial or other cell types is a complex task. Gene networks enable predicting gene behavior and function in the context of transcriptionally linked processes. We reconstruct and present the gene co-expression network (GCN) for the most widely studied L. lactis strain, MG1363, using publicly available transcriptome data. Several methods exist to generate and judge the quality of GCNs. Different reconstruction methods lead to networks with varying structural properties, consequently altering gene clusters. We compared the structural properties of the MG1363 GCNs generated by five methods, namely Pearson correlation, Spearman correlation, GeneNet, Weighted Gene Co-expression Network Analysis (WGCNA), and Sparse PArtial Correlation Estimation (SPACE). Using SPACE, we generated an L. lactis MG1363 GCN and assessed its quality using modularity and structural and biological criteria. The L. lactis MG1363 GCN has structural properties similar to those of the gold-standard networks of Escherichia coli K-12 and Bacillus subtilis 168. We showcase that the network can be used to mine for genes with similar expression profiles that are also generally linked to the same biological process.
Collapse
Affiliation(s)
- Jimmy Omony
- Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands
| | - Anne de Jong
- Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands
| | - Jan Kok
- Top Institute Food and Nutrition (TIFN), Wageningen, The Netherlands
- * E-mail:
| | | |
Collapse
|
58
|
Sonawane AR, Weiss ST, Glass K, Sharma A. Network Medicine in the Age of Biomedical Big Data. Front Genet 2019; 10:294. [PMID: 31031797 PMCID: PMC6470635 DOI: 10.3389/fgene.2019.00294] [Citation(s) in RCA: 111] [Impact Index Per Article: 22.2] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Accepted: 03/19/2019] [Indexed: 12/13/2022] Open
Abstract
Network medicine is an emerging area of research dealing with molecular and genetic interactions, network biomarkers of disease, and therapeutic target discovery. Large-scale biomedical data generation offers a unique opportunity to assess the effect and impact of cellular heterogeneity and environmental perturbations on the observed phenotype. Marrying the two, network medicine with biomedical data provides a framework to build meaningful models and extract impactful results at a network level. In this review, we survey existing network types and biomedical data sources. More importantly, we delve into ways in which the network medicine approach, aided by phenotype-specific biomedical data, can be gainfully applied. We provide three paradigms, mainly dealing with three major biological network archetypes: protein-protein interaction, expression-based, and gene regulatory networks. For each of these paradigms, we discuss a broad overview of philosophies under which various network methods work. We also provide a few examples in each paradigm as a test case of its successful application. Finally, we delineate several opportunities and challenges in the field of network medicine. We hope this review provides a lexicon for researchers from biological sciences and network theory to come on the same page to work on research areas that require interdisciplinary expertise. Taken together, the understanding gained from combining biomedical data with networks can be useful for characterizing disease etiologies and identifying therapeutic targets, which, in turn, will lead to better preventive medicine with translational impact on personalized healthcare.
Collapse
Affiliation(s)
- Abhijeet R. Sonawane
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Kimberly Glass
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
| | - Amitabh Sharma
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, United States
- Department of Medicine, Harvard Medical School, Boston, MA, United States
- Center for Interdisciplinary Cardiovascular Sciences, Cardiovascular Division, Brigham and Women’s Hospital, Boston, MA, United States
| |
Collapse
|
59
|
Genome-Wide Analysis of Glycoside Hydrolase Family 1 β-glucosidase Genes in Brassica rapa and Their Potential Role in Pollen Development. Int J Mol Sci 2019; 20:ijms20071663. [PMID: 30987159 PMCID: PMC6480273 DOI: 10.3390/ijms20071663] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2019] [Revised: 03/29/2019] [Accepted: 04/01/2019] [Indexed: 12/03/2022] Open
Abstract
Glycoside hydrolase family 1 (GH1) β-glucosidases (BGLUs) are encoded by a large number of genes, and are involved in many developmental processes and stress responses in plants. Due to their importance in plant growth and development, genome-wide analyses have been conducted in model plants (Arabidopsis and rice) and maize, but not in Brassica species, which are important vegetable crops. In this study, we systematically analyzed B. rapaBGLUs (BrBGLUs), and demonstrated the involvement of several genes in pollen development. Sixty-four BrBGLUs were identified in Brassica databases, which were anchored onto 10 chromosomes, with 10 tandem duplications. Phylogenetic analysis revealed that 64 genes were classified into 10 subgroups, and each subgroup had relatively conserved intron/exon structures. Clustering with Arabidopsis BGLUs (AtBGLUs) facilitated the identification of several important subgroups for flavonoid metabolism, the production of glucosinolates, the regulation of abscisic acid (ABA) levels, and other defense-related compounds. At least six BrBGLUs might be involved in pollen development. The expression of BrBGLU10/AtBGLU20, the analysis of co-expressed genes, and the examination of knocked down Arabidopsis plants strongly suggests that BrBGLU10/AtBGLU20 has an indispensable function in pollen development. The results that are obtained from this study may provide valuable information for the further understanding of β-glucosidase function and Brassica breeding, for nutraceuticals-rich Brassica crops.
Collapse
|
60
|
Sutherland BJG, Prokkola JM, Audet C, Bernatchez L. Sex-Specific Co-expression Networks and Sex-Biased Gene Expression in the Salmonid Brook Charr Salvelinus fontinalis. G3 (BETHESDA, MD.) 2019; 9:955-968. [PMID: 30692150 PMCID: PMC6404618 DOI: 10.1534/g3.118.200910] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/19/2018] [Accepted: 01/21/2019] [Indexed: 12/31/2022]
Abstract
Networks of co-expressed genes produce complex phenotypes associated with functional novelty. Sex differences in gene expression levels or in the structure of gene co-expression networks can cause sexual dimorphism and may resolve sexually antagonistic selection. Here we used RNA-sequencing in the salmonid Brook Charr Salvelinus fontinalis to characterize sex-specific co-expression networks in the liver of 47 female and 53 male offspring. In both networks, modules were characterized for functional enrichment, hub gene identification, and associations with 15 growth, reproduction, and stress-related phenotypes. Modules were then evaluated for preservation in the opposite sex, and in the congener Arctic Charr Salvelinus alpinus Overall, more transcripts were assigned to a module in the female network than in the male network, which coincided with higher inter-individual gene expression and phenotype variation in the females. Most modules were preserved between sexes and species, including those involved in conserved cellular processes (e.g., translation, immune pathways). However, two sex-specific male modules were identified, and these may contribute to sexual dimorphism. To compare with the network analysis, differentially expressed transcripts were identified between the sexes, revealing a total of 16% of expressed transcripts as sex-biased. For both sexes, there was no overrepresentation of sex-biased genes or sex-specific modules on the putative sex chromosome. Sex-biased transcripts were also not overrepresented in sex-specific modules, and in fact highly male-biased transcripts were enriched in preserved modules. Comparative network analysis and differential expression analyses identified different aspects of sex differences in gene expression, and both provided new insights on the genes underlying sexual dimorphism in the salmonid Brook Charr.
Collapse
Affiliation(s)
- Ben J G Sutherland
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC G1V 0A6, Canada
| | - Jenni M Prokkola
- Institute of Integrative Biology, University of Liverpool, L69 7ZB Liverpool, UK
| | - Céline Audet
- Institut des Sciences de la Mer de Rimouski, Université du Québec à Rimouski, Rimouski, QC G5L 3A1, Canada
| | - Louis Bernatchez
- Institut de Biologie Intégrative et des Systèmes (IBIS), Université Laval, Québec, QC G1V 0A6, Canada
| |
Collapse
|
61
|
Gupta C, Pereira A. Recent advances in gene function prediction using context-specific coexpression networks in plants. F1000Res 2019; 8:F1000 Faculty Rev-153. [PMID: 30800290 PMCID: PMC6364378 DOI: 10.12688/f1000research.17207.1] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 01/30/2019] [Indexed: 12/11/2022] Open
Abstract
Predicting gene functions from genome sequence alone has been difficult, and the functions of a large fraction of plant genes remain unknown. However, leveraging the vast amount of currently available gene expression data has the potential to facilitate our understanding of plant gene functions, especially in determining complex traits. Gene coexpression networks-created by integrating multiple expression datasets-connect genes with similar patterns of expression across multiple conditions. Dense gene communities in such networks, commonly referred to as modules, often indicate that the member genes are functionally related. As such, these modules serve as tools for generating new testable hypotheses, including the prediction of gene function and importance. Recently, we have seen a paradigm shift from the traditional "global" to more defined, context-specific coexpression networks. Such coexpression networks imply genetic correlations in specific biological contexts such as during development or in response to a stress. In this short review, we highlight a few recent studies that attempt to fill the large gaps in our knowledge about cellular functions of plant genes using context-specific coexpression networks.
Collapse
Affiliation(s)
- Chirag Gupta
- Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| | - Andy Pereira
- Crop, Soil and Environmental Sciences, University of Arkansas, Fayetteville, AR, USA
| |
Collapse
|
62
|
Yang W, Han J, Ma J, Feng Y, Hou Q, Wang Z, Yu T. Prediction of key gene function in spinal muscular atrophy using guilt by association method based on network and gene ontology. Exp Ther Med 2019; 17:2561-2566. [PMID: 30906446 PMCID: PMC6425128 DOI: 10.3892/etm.2019.7216] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/04/2018] [Accepted: 01/23/2019] [Indexed: 12/21/2022] Open
Abstract
Guilt by association (GBA) algorithm has been widely used to predict gene functions statistically, and a network-based approach may increase the confidence and veracity of identifying molecular signatures for diseases. The aim of the present study was to suggest a gene ontology (GO)-based method by integrating the GBA algorithm and network, to identify key gene functions for spinal muscular atrophy (SMA). The inference of predicting key gene functions was comprised of four steps, preparing gene lists and sets; extracting differentially expressed genes (DEGs) using microarray data [linear models for microarray data (limma)] package; constructing a co-expression matrix on gene lists using the Spearman correlation coefficient method; and predicting gene functions by GBA algorithm. Ultimately, key gene functions were predicted according to the area under the curve (AUC) index for GO terms and the GO terms with AUC >0.7 were determined as the optimal gene functions for SMA. A total of 484 DEGs and 466 background GO terms were regarded as gene lists and sets for the subsequent analyses, respectively. The predicted results obtained from the network-based GBA approach showed 141 gene sets had a good classified performance with AUC >0.5. Most significantly, 3 gene sets with AUC >0.7 were denoted as seed gene functions for SMA, including cell morphogenesis, which is involved in differentiation and ossification. In conclusion, we have predicted 3 key gene functions for SMA compared with control utilizing network-based GBA algorithm. The findings may provide great insights to reveal pathological and molecular mechanism underlying SMA.
Collapse
Affiliation(s)
- Wenjiu Yang
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Jing Han
- Department of Ophthalmology, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Jinfeng Ma
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Yujie Feng
- Hepatobiliary Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Qingxian Hou
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Zhijie Wang
- Department of Spine Surgery, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| | - Tengbo Yu
- Sports Medicine, The Affiliated Hospital of Qingdao University, Qingdao, Shandong 266071, P.R. China
| |
Collapse
|
63
|
Li YF, Zheng Y, Vemireddy LR, Panda SK, Jose S, Ranjan A, Panda P, Govindan G, Cui J, Wei K, Yaish MW, Naidoo GC, Sunkar R. Comparative transcriptome and translatome analysis in contrasting rice genotypes reveals differential mRNA translation in salt-tolerant Pokkali under salt stress. BMC Genomics 2018; 19:935. [PMID: 30598105 PMCID: PMC6311934 DOI: 10.1186/s12864-018-5279-4] [Citation(s) in RCA: 39] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/17/2022] Open
Abstract
Background Soil salinity is one of the primary causes of yield decline in rice. Pokkali (Pok) is a highly salt-tolerant landrace, whereas IR29 is a salt-sensitive but widely cultivated genotype. Comparative analysis of these genotypes may offer a better understanding of the salinity tolerance mechanisms in rice. Although most stress-responsive genes are regulated at the transcriptional level, in many cases, changes at the transcriptional level are not always accompanied with the changes in protein abundance, which suggests that the transcriptome needs to be studied in conjunction with the proteome to link the phenotype of stress tolerance or sensitivity. Published reports have largely underscored the importance of transcriptional regulation during salt stress in these genotypes, but the regulation at the translational level has been rarely studied. Using RNA-Seq, we simultaneously analyzed the transcriptome and translatome from control and salt-exposed Pok and IR29 seedlings to unravel molecular insights into gene regulatory mechanisms that differ between these genotypes. Results Clear differences were evident at both transcriptional and translational levels between the two genotypes even under the control condition. In response to salt stress, 57 differentially expressed genes (DEGs) were commonly upregulated at both transcriptional and translational levels in both genotypes; the overall number of up/downregulated DEGs in IR29 was comparable at both transcriptional and translational levels, whereas in Pok, the number of upregulated DEGs was considerably higher at the translational level (544 DEGs) than at the transcriptional level (219 DEGs); in contrast, the number of downregulated DEGs (58) was significantly less at the translational level than at the transcriptional level (397 DEGs). These results imply that Pok stabilizes mRNAs and also efficiently loads mRNAs onto polysomes for translation during salt stress. Conclusion Under salt stress, Pok is more efficient in maintaining cell wall integrity, detoxifying reactive oxygen species (ROS), translocating molecules and maintaining photosynthesis. The present study confirmed the known salt stress-associated genes and also identified a number of putative new salt-responsive genes. Most importantly, the study revealed that the translational regulation under salinity plays an important role in salt-tolerant Pok, but such regulation was less evident in the salt-sensitive IR29. Electronic supplementary material The online version of this article (10.1186/s12864-018-5279-4) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yong-Fang Li
- College of Life Sciences, Henan Normal University, Xinxiang, 453007, Henan, China. .,Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA.
| | - Yun Zheng
- Yunnan Key Lab of Primate Biomedicine Research; Institute of Primate Translational Medicine, Kunming University of Science and Technology, Kunming, Yunnan, 650500, China
| | | | - Sanjib Kumar Panda
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | - Smitha Jose
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | - Alok Ranjan
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | - Piyalee Panda
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | - Ganesan Govindan
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA
| | - Junxia Cui
- College of Life Sciences, Henan Normal University, Xinxiang, 453007, Henan, China
| | - Kangning Wei
- College of Life Sciences, Henan Normal University, Xinxiang, 453007, Henan, China
| | - Mahmoud W Yaish
- Department of Biology, College of Science, Sultan Qaboos University, Muscat, Oman
| | | | - Ramanjulu Sunkar
- Department of Biochemistry and Molecular Biology, Oklahoma State University, Stillwater, OK, 74078, USA.
| |
Collapse
|
64
|
Boudellioua I, Kulmanov M, Schofield PN, Gkoutos GV, Hoehndorf R. OligoPVP: Phenotype-driven analysis of individual genomic information to prioritize oligogenic disease variants. Sci Rep 2018; 8:14681. [PMID: 30279426 PMCID: PMC6168481 DOI: 10.1038/s41598-018-32876-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2018] [Accepted: 09/18/2018] [Indexed: 12/12/2022] Open
Abstract
An increasing number of disorders have been identified for which two or more distinct alleles in two or more genes are required to either cause the disease or to significantly modify its onset, severity or phenotype. It is difficult to discover such interactions using existing approaches. The purpose of our work is to develop and evaluate a system that can identify combinations of alleles underlying digenic and oligogenic diseases in individual whole exome or whole genome sequences. Information that links patient phenotypes to databases of gene-phenotype associations observed in clinical or non-human model organism research can provide useful information and improve variant prioritization for genetic diseases. Additional background knowledge about interactions between genes can be utilized to identify sets of variants in different genes in the same individual which may then contribute to the overall disease phenotype. We have developed OligoPVP, an algorithm that can be used to prioritize causative combinations of variants in digenic and oligogenic diseases, using whole exome or whole genome sequences together with patient phenotypes as input. We demonstrate that OligoPVP has significantly improved performance when compared to state of the art pathogenicity detection methods in the case of digenic diseases. Our results show that OligoPVP can efficiently prioritize sets of variants in digenic diseases using a phenotype-driven approach and identify etiologically important variants in whole genomes. OligoPVP naturally extends to oligogenic disease involving interactions between variants in two or more genes. It can be applied to the identification of multiple interacting candidate variants contributing to phenotype, where the action of modifier genes is suspected from pedigree analysis or failure of traditional causative variant identification.
Collapse
Affiliation(s)
- Imane Boudellioua
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maxat Kulmanov
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Paul N Schofield
- Department of Physiology, Development & Neuroscience, University of Cambridge, Cambridge, UK
| | - Georgios V Gkoutos
- College of Medical and Dental Sciences, Institute of Cancer and Genomic Sciences, Centre for Computational Biology, University of Birmingham, B15 2TT, Birmingham, United Kingdom
- Institute of Translational Medicine, University Hospitals Birmingham, NHS Foundation Trust, B15 2TT, Birmingham, United Kingdom
- NIHR Experimental Cancer Medicine Centre, B15 2TT, Birmingham, UK
- NIHR Surgical Reconstruction and Microbiology Research Centre, B15 2TT, Birmingham, UK
- NIHR Biomedical Research Centre, B15 2TT, Birmingham, UK
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia.
| |
Collapse
|
65
|
Ballouz S, Pavlidis P, Gillis J. Using predictive specificity to determine when gene set analysis is biologically meaningful. Nucleic Acids Res 2018; 45:e20. [PMID: 28204549 PMCID: PMC5389513 DOI: 10.1093/nar/gkw957] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2016] [Revised: 10/04/2016] [Accepted: 10/10/2016] [Indexed: 11/14/2022] Open
Abstract
Gene set analysis, which translates gene lists into enriched functions, is among the most common bioinformatic methods. Yet few would advocate taking the results at face value. Not only is there no agreement on the algorithms themselves, there is no agreement on how to benchmark them. In this paper, we evaluate the robustness and uniqueness of enrichment results as a means of assessing methods even where correctness is unknown. We show that heavily annotated (‘multifunctional’) genes are likely to appear in genomics study results and drive the generation of biologically non-specific enrichment results as well as highly fragile significances. By providing a means of determining where enrichment analyses report non-specific and non-robust findings, we are able to assess where we can be confident in their use. We find significant progress in recent bias correction methods for enrichment and provide our own software implementation. Our approach can be readily adapted to any pre-existing package.
Collapse
Affiliation(s)
- Sara Ballouz
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, NY 11797, USA
| | - Paul Pavlidis
- Department of Psychiatry and Michael Smith Laboratories, University of British Columbia, Vancouver, BC, V6T 1Z4, Canada
| | - Jesse Gillis
- Stanley Institute for Cognitive Genomics, Cold Spring Harbor Laboratory, Woodbury, NY 11797, USA
| |
Collapse
|
66
|
Stoeger T, Gerlach M, Morimoto RI, Nunes Amaral LA. Large-scale investigation of the reasons why potentially important genes are ignored. PLoS Biol 2018; 16:e2006643. [PMID: 30226837 PMCID: PMC6143198 DOI: 10.1371/journal.pbio.2006643] [Citation(s) in RCA: 137] [Impact Index Per Article: 22.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2018] [Accepted: 08/10/2018] [Indexed: 01/04/2023] Open
Abstract
Biomedical research has been previously reported to primarily focus on a minority of all known genes. Here, we demonstrate that these differences in attention can be explained, to a large extent, exclusively from a small set of identifiable chemical, physical, and biological properties of genes. Together with knowledge about homologous genes from model organisms, these features allow us to accurately predict the number of publications on individual human genes, the year of their first report, the levels of funding awarded by the National Institutes of Health (NIH), and the development of drugs against disease-associated genes. By explicitly identifying the reasons for gene-specific bias and performing a meta-analysis of existing computational and experimental knowledge bases, we describe gene-specific strategies for the identification of important but hitherto ignored genes that can open novel directions for future investigation.
Collapse
Affiliation(s)
- Thomas Stoeger
- Center for Genetic Medicine, Northwestern University, Chicago, United States of America
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, United States of America
| | - Martin Gerlach
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, United States of America
| | - Richard I. Morimoto
- Department of Molecular Bioscience, Northwestern University, Evanston, United States of America
| | - Luís A. Nunes Amaral
- Northwestern Institute on Complex Systems (NICO), Northwestern University, Evanston, United States of America
- Department of Chemical and Biological Engineering, Northwestern University, Evanston, United States of America
- Department of Molecular Bioscience, Northwestern University, Evanston, United States of America
- Department of Physics and Astronomy, Northwestern University, Evanston, United States of America
| |
Collapse
|
67
|
Guo G, Liu Y, Ren S, Kang Y, Duscher D, Machens HG, Chen Z. Comprehensive analysis of differentially expressed microRNAs and mRNAs in dorsal root ganglia from streptozotocin-induced diabetic rats. PLoS One 2018; 13:e0202696. [PMID: 30118515 PMCID: PMC6097669 DOI: 10.1371/journal.pone.0202696] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2018] [Accepted: 08/06/2018] [Indexed: 01/22/2023] Open
Abstract
Diabetic peripheral neuropathy is a common complication associated with diabetes mellitus with a pathogenesis that is incompletely understood. By regulating RNA silencing and post-transcriptional gene expression, microRNAs participate in various biological processes and human diseases. However, the relationship between microRNAs and the progress of diabetic peripheral neuropathy still lacks a thorough exploration. Here we used microarray microRNA and mRNA expression profiling to analyze the microRNAs and mRNAs which are aberrantly expressed in dorsal root ganglia from streptozotocin-induced diabetic rats. We found that 37 microRNAs and 1357 mRNAs were differentially expressed in comparison to non-diabetic samples. Bioinformatics analysis indicated that 399 gene ontology terms and 29 Kyoto Encyclopedia of Genes and Genomes pathways were significantly enriched in diabetic rats. Additionally, a microRNA-gene network evaluation identified rno-miR-330-5p, rno-miR-17-1-3p and rno-miR-346 as important players for network regulation. Finally, quantitative real-time polymerase chain reaction analysis was used to confirm the microarray results. In conclusion, this study provides a systematic perspective of microRNA and mRNA expression in dorsal root ganglia from diabetic rats, and suggests that dysregulated microRNAs and mRNAs may be important promotors of peripheral neuropathy. Our results may be the underlying framework of future studies regarding the effect of the aberrantly expressed genes on the pathophysiology of diabetic peripheral neuropathy.
Collapse
Affiliation(s)
- Guojun Guo
- Department of Hand Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yutian Liu
- Department of Hand Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Sen Ren
- Department of Hand Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Yu Kang
- Department of Hand Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| | - Dominik Duscher
- Department of Plastic and Hand Surgery, Technical University of Munich, Munich, Germany
| | - Hans-Günther Machens
- Department of Plastic and Hand Surgery, Technical University of Munich, Munich, Germany
| | - Zhenbing Chen
- Department of Hand Surgery, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
| |
Collapse
|
68
|
Parraga-Alava J, Dorn M, Inostroza-Ponta M. A multi-objective gene clustering algorithm guided by apriori biological knowledge with intensification and diversification strategies. BioData Min 2018; 11:16. [PMID: 30100924 PMCID: PMC6081857 DOI: 10.1186/s13040-018-0178-4] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2017] [Accepted: 07/29/2018] [Indexed: 01/10/2023] Open
Abstract
BACKGROUND Biologists aim to understand the genetic background of diseases, metabolic disorders or any other genetic condition. Microarrays are one of the main high-throughput technologies for collecting information about the behaviour of genetic information on different conditions. In order to analyse this data, clustering arises as one of the main techniques used, and it aims at finding groups of genes that have some criterion in common, like similar expression profile. However, the problem of finding groups is normally multi dimensional, making necessary to approach the clustering as a multi-objective problem where various cluster validity indexes are simultaneously optimised. They are usually based on criteria like compactness and separation, which may not be sufficient since they can not guarantee the generation of clusters that have both similar expression patterns and biological coherence. METHOD We propose a Multi-Objective Clustering algorithm Guided by a-Priori Biological Knowledge (MOC-GaPBK) to find clusters of genes with high levels of co-expression, biological coherence, and also good compactness and separation. Cluster quality indexes are used to optimise simultaneously gene relationships at expression level and biological functionality. Our proposal also includes intensification and diversification strategies to improve the search process. RESULTS The effectiveness of the proposed algorithm is demonstrated on four publicly available datasets. Comparative studies of the use of different objective functions and other widely used microarray clustering techniques are reported. Statistical, visual and biological significance tests are carried out to show the superiority of the proposed algorithm. CONCLUSIONS Integrating a-priori biological knowledge into a multi-objective approach and using intensification and diversification strategies allow the proposed algorithm to find solutions with higher quality than other microarray clustering techniques available in the literature in terms of co-expression, biological coherence, compactness and separation.
Collapse
Affiliation(s)
- Jorge Parraga-Alava
- Centre for Biotechnology and Bioengineering (CeBiB), Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile
- Carrera de Computación, Escuela Superior Politécnica Agropecuaria de Manabí Manuel Félix López, Campus Politécnico Sitio El Limón, Calceta, Ecuador
| | - Marcio Dorn
- Instituto de Informatica, Universidade Federal do Rio Grande do Sul, Av. Bento Gonçalves 9500, Porto Alegre, 91501-970 Brasil
| | - Mario Inostroza-Ponta
- Centre for Biotechnology and Bioengineering (CeBiB), Departamento de Ingeniería Informática, Universidad de Santiago de Chile, Av. Ecuador 3659, Santiago, Chile
| |
Collapse
|
69
|
van Dam S, Võsa U, van der Graaf A, Franke L, de Magalhães JP. Gene co-expression analysis for functional classification and gene-disease predictions. Brief Bioinform 2018; 19:575-592. [PMID: 28077403 PMCID: PMC6054162 DOI: 10.1093/bib/bbw139] [Citation(s) in RCA: 431] [Impact Index Per Article: 71.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Revised: 12/01/2016] [Indexed: 01/06/2023] Open
Abstract
Gene co-expression networks can be used to associate genes of unknown function with biological processes, to prioritize candidate disease genes or to discern transcriptional regulatory programmes. With recent advances in transcriptomics and next-generation sequencing, co-expression networks constructed from RNA sequencing data also enable the inference of functions and disease associations for non-coding genes and splice variants. Although gene co-expression networks typically do not provide information about causality, emerging methods for differential co-expression analysis are enabling the identification of regulatory genes underlying various phenotypes. Here, we introduce and guide researchers through a (differential) co-expression analysis. We provide an overview of methods and tools used to create and analyse co-expression networks constructed from gene expression data, and we explain how these can be used to identify genes with a regulatory role in disease. Furthermore, we discuss the integration of other data types with co-expression networks and offer future perspectives of co-expression analysis.
Collapse
Affiliation(s)
- Sipko van Dam
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | - Urmo Võsa
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | | - Lude Franke
- Department of Genetics, UMCG HPC CB50, RB Groningen, Netherlands
| | | |
Collapse
|
70
|
Mills BD, Grayson DS, Shunmugavel A, Miranda-Dominguez O, Feczko E, Earl E, Neve KA, Fair DA. Correlated Gene Expression and Anatomical Communication Support Synchronized Brain Activity in the Mouse Functional Connectome. J Neurosci 2018; 38:5774-5787. [PMID: 29789379 PMCID: PMC6010566 DOI: 10.1523/jneurosci.2910-17.2018] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 05/07/2018] [Accepted: 05/10/2018] [Indexed: 01/13/2023] Open
Abstract
Cognition and behavior depend on synchronized intrinsic brain activity that is organized into functional networks across the brain. Research has investigated how anatomical connectivity both shapes and is shaped by these networks, but not how anatomical connectivity interacts with intra-areal molecular properties to drive functional connectivity. Here, we present a novel linear model to explain functional connectivity by integrating systematically obtained measurements of axonal connectivity, gene expression, and resting-state functional connectivity MRI in the mouse brain. The model suggests that functional connectivity arises from both anatomical links and inter-areal similarities in gene expression. By estimating these effects, we identify anatomical modules in which correlated gene expression and anatomical connectivity support functional connectivity. Along with providing evidence that not all genes equally contribute to functional connectivity, this research establishes new insights regarding the biological underpinnings of coordinated brain activity measured by BOLD fMRI.SIGNIFICANCE STATEMENT Efforts at characterizing the functional connectome with fMRI have risen exponentially over the last decade. Yet despite this rise, the biological underpinnings of these functional measurements are still primarily unknown. The current report begins to fill this void by investigating the molecular underpinnings of the functional connectome through an integration of systematically obtained structural information and gene expression data throughout the rodent brain. We find that both white matter connectivity and similarity in regional gene expression relate to resting-state functional connectivity. The current report furthers our understanding of the biological underpinnings of the functional connectome and provides a linear model that can be used to streamline preclinical animal studies of disease.
Collapse
Affiliation(s)
| | - David S Grayson
- Department of Behavioral Neuroscience
- The MIND Institute, University of California Davis, Sacramento, California 95817, and
- Center for Neuroscience, University of California Davis, Davis, California 95616
| | | | | | | | - Eric Earl
- Department of Behavioral Neuroscience
| | - Kim A Neve
- Department of Behavioral Neuroscience
- Research Service, VA Portland Health Care System, United States Department of Veterans Affairs, Portland, Oregon 97239
| | - Damien A Fair
- Department of Behavioral Neuroscience,
- Advanced Imaging Research Center
- Department of Psychiatry, Oregon Health & Science University, Portland, Oregon 97239
| |
Collapse
|
71
|
Cornish AJ, David A, Sternberg MJE. PhenoRank: reducing study bias in gene prioritization through simulation. Bioinformatics 2018; 34:2087-2095. [PMID: 29360927 PMCID: PMC5949213 DOI: 10.1093/bioinformatics/bty028] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2017] [Revised: 01/10/2018] [Accepted: 01/16/2018] [Indexed: 02/07/2023] Open
Abstract
Motivation Genome-wide association studies have identified thousands of loci associated with human disease, but identifying the causal genes at these loci is often difficult. Several methods prioritize genes most likely to be disease causing through the integration of biological data, including protein-protein interaction and phenotypic data. Data availability is not the same for all genes however, potentially influencing the performance of these methods. Results We demonstrate that whilst disease genes tend to be associated with greater numbers of data, this may be at least partially a result of them being better studied. With this observation we develop PhenoRank, which prioritizes disease genes whilst avoiding being biased towards genes with more available data. Bias is avoided by comparing gene scores generated for the query disease against gene scores generated using simulated sets of phenotype terms, which ensures that differences in data availability do not affect the ranking of genes. We demonstrate that whilst existing prioritization methods are biased by data availability, PhenoRank is not similarly biased. Avoiding this bias allows PhenoRank to effectively prioritize genes with fewer available data and improves its overall performance. PhenoRank outperforms three available prioritization methods in cross-validation (PhenoRank area under receiver operating characteristic curve [AUC]=0.89, DADA AUC = 0.87, EXOMISER AUC = 0.71, PRINCE AUC = 0.83, P < 2.2 × 10-16). Availability and implementation PhenoRank is freely available for download at https://github.com/alexjcornish/PhenoRank. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Alex J Cornish
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Alessia David
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| | - Michael J E Sternberg
- Department of Life Sciences, Center of Bioinformatics and Systems
Biology, Imperial College London, London, UK
| |
Collapse
|
72
|
Zhang S, Zhang L, Tai Y, Wang X, Ho CT, Wan X. Gene Discovery of Characteristic Metabolic Pathways in the Tea Plant ( Camellia sinensis) Using 'Omics'-Based Network Approaches: A Future Perspective. FRONTIERS IN PLANT SCIENCE 2018; 9:480. [PMID: 29915604 PMCID: PMC5994431 DOI: 10.3389/fpls.2018.00480] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/15/2017] [Accepted: 03/29/2018] [Indexed: 05/23/2023]
Abstract
Characteristic secondary metabolites, including flavonoids, theanine and caffeine, in the tea plant (Camellia sinensis) are the primary sources of the rich flavors, fresh taste, and health benefits of tea. The decoding of genes involved in these characteristic components is still significantly lagging, which lays an obstacle for applied genetic improvement and metabolic engineering. With the popularity of high-throughout transcriptomics and metabolomics, 'omics'-based network approaches, such as gene co-expression network and gene-to-metabolite network, have emerged as powerful tools for gene discovery of plant-specialized (secondary) metabolism. Thus, it is pivotal to summarize and introduce such system-based strategies in facilitating gene identification of characteristic metabolic pathways in the tea plant (or other plants). In this review, we describe recent advances in transcriptomics and metabolomics for transcript and metabolite profiling, and highlight 'omics'-based network strategies using successful examples in model and non-model plants. Further, we summarize recent progress in 'omics' analysis for gene identification of characteristic metabolites in the tea plant. Limitations of the current strategies are discussed by comparison with 'omics'-based network approaches. Finally, we demonstrate the potential of introducing such network strategies in the tea plant, with a prospects ending for a promising network discovery of characteristic metabolite genes in the tea plant.
Collapse
Affiliation(s)
- Shihua Zhang
- State Key Laboratory of Tea Plant Biology and Utilization, Institute of Applied Mathematics, Anhui Agricultural University, Hefei, China
| | - Liang Zhang
- State Key Laboratory of Tea Plant Biology and Utilization, Institute of Applied Mathematics, Anhui Agricultural University, Hefei, China
| | - Yuling Tai
- School of Life Sciences, Anhui Agricultural University, Hefei, China
| | - Xuewen Wang
- Department of Genetics, University of Georgia, Athens, GA, United States
| | - Chi-Tang Ho
- Department of Food Science, Rutgers University, New Brunswick, NJ, United States
| | - Xiaochun Wan
- State Key Laboratory of Tea Plant Biology and Utilization, Institute of Applied Mathematics, Anhui Agricultural University, Hefei, China
| |
Collapse
|
73
|
Karimzadeh M, Jandaghi P, Papadakis AI, Trainor S, Rung J, Gonzàlez-Porta M, Scelo G, Vasudev NS, Brazma A, Huang S, Banks RE, Lathrop M, Najafabadi HS, Riazalhosseini Y. Aberration hubs in protein interaction networks highlight actionable targets in cancer. Oncotarget 2018; 9:25166-25180. [PMID: 29861861 PMCID: PMC5982744 DOI: 10.18632/oncotarget.25382] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2017] [Accepted: 04/24/2018] [Indexed: 01/08/2023] Open
Abstract
Despite efforts for extensive molecular characterization of cancer patients, such as the international cancer genome consortium (ICGC) and the cancer genome atlas (TCGA), the heterogeneous nature of cancer and our limited knowledge of the contextual function of proteins have complicated the identification of targetable genes. Here, we present Aberration Hub Analysis for Cancer (AbHAC) as a novel integrative approach to pinpoint aberration hubs, i.e. individual proteins that interact extensively with genes that show aberrant mutation or expression. Our analysis of the breast cancer data of the TCGA and the renal cancer data from the ICGC shows that aberration hubs are involved in relevant cancer pathways, including factors promoting cell cycle and DNA replication in basal-like breast tumors, and Src kinase and VEGF signaling in renal carcinoma. Moreover, our analysis uncovers novel functionally relevant and actionable targets, among which we have experimentally validated abnormal splicing of spleen tyrosine kinase as a key factor for cell proliferation in renal cancer. Thus, AbHAC provides an effective strategy to uncover novel disease factors that are only identifiable by examining mutational and expression data in the context of biological networks.
Collapse
Affiliation(s)
- Mehran Karimzadeh
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0G1, Canada
| | - Pouria Jandaghi
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0G1, Canada
| | - Andreas I. Papadakis
- Department of Biochemistry, The Rosalind and Morris Goodman Cancer Centre, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Sebastian Trainor
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds, LS9 7TF, UK
| | - Johan Rung
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Mar Gonzàlez-Porta
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Ghislaine Scelo
- International Agency for Research on Cancer (IARC), Lyon, 69008, France
| | - Naveen S. Vasudev
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds, LS9 7TF, UK
| | - Alvis Brazma
- European Molecular Biology Laboratory, European Bioinformatics Institute, EMBL-EBI, Wellcome Trust Genome Campus, Hinxton, CB10 1SD, UK
| | - Sidong Huang
- Department of Biochemistry, The Rosalind and Morris Goodman Cancer Centre, McGill University, Montreal, QC H3G 1Y6, Canada
| | - Rosamonde E. Banks
- Leeds Institute of Cancer and Pathology, University of Leeds, Cancer Research Building, St James's University Hospital, Leeds, LS9 7TF, UK
| | - Mark Lathrop
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0G1, Canada
| | - Hamed S. Najafabadi
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0G1, Canada
| | - Yasser Riazalhosseini
- Department of Human Genetics, McGill University, Montreal, QC H3A 1B1, Canada
- McGill University and Genome Quebec Innovation Centre, Montreal, QC H3A 0G1, Canada
| |
Collapse
|
74
|
Bolger ME, Arsova B, Usadel B. Plant genome and transcriptome annotations: from misconceptions to simple solutions. Brief Bioinform 2018; 19:437-449. [PMID: 28062412 PMCID: PMC5952960 DOI: 10.1093/bib/bbw135] [Citation(s) in RCA: 37] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2016] [Revised: 11/29/2016] [Indexed: 12/14/2022] Open
Abstract
Next-generation sequencing has triggered an explosion of available genomic and transcriptomic resources in the plant sciences. Although genome and transcriptome sequencing has become orders of magnitudes cheaper and more efficient, often the functional annotation process is lagging behind. This might be hampered by the lack of a comprehensive enumeration of simple-to-use tools available to the plant researcher. In this comprehensive review, we present (i) typical ontologies to be used in the plant sciences, (ii) useful databases and resources used for functional annotation, (iii) what to expect from an annotated plant genome, (iv) an automated annotation pipeline and (v) a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources.
Collapse
Affiliation(s)
- Marie E Bolger
- Forschungszentrum Jülich, Wilhelm Johnen Str, Jülich, Germany
| | - Borjana Arsova
- Forschungszentrum Jülich, Wilhelm Johnen Str, Jülich, Germany
- FRS-FNRS Chargé de Recherches, Functional Genomics and Plant Molecular Imaging Center for Protein Engineering (CIP), Dpt of Life Sciences, University of Liège, Quartier de la Vallée, 1, Chemin de la Vallée, 4 - Bât B22, 4000 LIEGE, Belgium
| | - Björn Usadel
- Forschungszentrum Jülich, Wilhelm Johnen Str, Jülich, Germany
- RWTH Aachen University, Institute for Biology I Botany, BioSC, Worringer Weg 3, Aachen, Germany
| |
Collapse
|
75
|
Wong DCJ, Zhang L, Merlin I, Castellarin SD, Gambetta GA. Structure and transcriptional regulation of the major intrinsic protein gene family in grapevine. BMC Genomics 2018; 19:248. [PMID: 29642857 PMCID: PMC5896048 DOI: 10.1186/s12864-018-4638-5] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2018] [Accepted: 03/29/2018] [Indexed: 12/05/2022] Open
Abstract
Background The major intrinsic protein (MIP) family is a family of proteins, including aquaporins, which facilitate water and small molecule transport across plasma membranes. In plants, MIPs function in a huge variety of processes including water transport, growth, stress response, and fruit development. In this study, we characterize the structure and transcriptional regulation of the MIP family in grapevine, describing the putative genome duplication events leading to the family structure and characterizing the family’s tissue and developmental specific expression patterns across numerous preexisting microarray and RNAseq datasets. Gene co-expression network (GCN) analyses were carried out across these datasets and the promoters of each family member were analyzed for cis-regulatory element structure in order to provide insight into their transcriptional regulation. Results A total of 29 Vitis vinifera MIP family members (excluding putative pseudogenes) were identified of which all but two were mapped onto Vitis vinifera chromosomes. In this study, segmental duplication events were identified for five plasma membrane intrinsic protein (PIP) and four tonoplast intrinsic protein (TIP) genes, contributing to the expansion of PIPs and TIPs in grapevine. Grapevine MIP family members have distinct tissue and developmental expression patterns and hierarchical clustering revealed two primary groups regardless of the datasets analyzed. Composite microarray and RNA-seq gene co-expression networks (GCNs) highlighted the relationships between MIP genes and functional categories involved in cell wall modification and transport, as well as with other MIPs revealing a strong co-regulation within the family itself. Some duplicated MIP family members have undergone sub-functionalization and exhibit distinct expression patterns and GCNs. Cis-regulatory element (CRE) analyses of the MIP promoters and their associated GCN members revealed enrichment for numerous CREs including AP2/ERFs and NACs. Conclusions Combining phylogenetic analyses, gene expression profiling, gene co-expression network analyses, and cis-regulatory element enrichment, this study provides a comprehensive overview of the structure and transcriptional regulation of the grapevine MIP family. The study highlights the duplication and sub-functionalization of the family, its strong coordinated expression with genes involved in growth and transport, and the putative classes of TFs responsible for its regulation. Electronic supplementary material The online version of this article (10.1186/s12864-018-4638-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Darren Chern Jan Wong
- Wine Research Centre, University of British Columbia, 2205 East Mall, Vancouver, BC, V6T 0Z4, Canada
| | - Li Zhang
- Bordeaux Science Agro, Institut des Sciences de la Vigne et du Vin, Ecophysiologie et Génomique Fonctionnelle de la Vigne, UMR 1287, F- 33140, Villenave d'Ornon, France
| | - Isabelle Merlin
- Bordeaux Science Agro, Institut des Sciences de la Vigne et du Vin, Ecophysiologie et Génomique Fonctionnelle de la Vigne, UMR 1287, F- 33140, Villenave d'Ornon, France
| | - Simone D Castellarin
- Wine Research Centre, University of British Columbia, 2205 East Mall, Vancouver, BC, V6T 0Z4, Canada
| | - Gregory A Gambetta
- Bordeaux Science Agro, Institut des Sciences de la Vigne et du Vin, Ecophysiologie et Génomique Fonctionnelle de la Vigne, UMR 1287, F- 33140, Villenave d'Ornon, France.
| |
Collapse
|
76
|
Tomczak A, Mortensen JM, Winnenburg R, Liu C, Alessi DT, Swamy V, Vallania F, Lofgren S, Haynes W, Shah NH, Musen MA, Khatri P. Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations. Sci Rep 2018; 8:5115. [PMID: 29572502 PMCID: PMC5865181 DOI: 10.1038/s41598-018-23395-2] [Citation(s) in RCA: 72] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2017] [Accepted: 03/12/2018] [Indexed: 12/12/2022] Open
Abstract
Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis — the ontology and the annotations — evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.
Collapse
Affiliation(s)
- Aurelie Tomczak
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA.,Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Jonathan M Mortensen
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Rainer Winnenburg
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Charles Liu
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA
| | - Dominique T Alessi
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Varsha Swamy
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA
| | - Francesco Vallania
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA
| | - Shane Lofgren
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA
| | - Winston Haynes
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA
| | - Nigam H Shah
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Mark A Musen
- Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA
| | - Purvesh Khatri
- Stanford Institute for Immunity, Transplantation and Infection (ITI), Stanford University, Stanford, CA, 94305, USA. .,Stanford Center for Biomedical Informatics Research (BMIR), Department of Medicine, Stanford University, Stanford, CA, 94305, USA.
| |
Collapse
|
77
|
Omony J, de Jong A, Krawczyk AO, Eijlander RT, Kuipers OP. Dynamic sporulation gene co-expression networks for Bacillus subtilis 168 and the food-borne isolate Bacillus amyloliquefaciens: a transcriptomic model. Microb Genom 2018; 4. [PMID: 29424683 PMCID: PMC5857382 DOI: 10.1099/mgen.0.000157] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
Sporulation is a survival strategy, adapted by bacterial cells in response to harsh environmental adversities. The adaptation potential differs between strains and the variations may arise from differences in gene regulation. Gene networks are a valuable way of studying such regulation processes and establishing associations between genes. We reconstructed and compared sporulation gene co-expression networks (GCNs) of the model laboratory strain Bacillus subtilis 168 and the food-borne industrial isolate Bacillus amyloliquefaciens. Transcriptome data obtained from samples of six stages during the sporulation process were used for network inference. Subsequently, a gene set enrichment analysis was performed to compare the reconstructed GCNs of B. subtilis 168 and B. amyloliquefaciens with respect to biological functions, which showed the enriched modules with coherent functional groups associated with sporulation. On basis of the GCNs and time-evolution of differentially expressed genes, we could identify novel candidate genes strongly associated with sporulation in B. subtilis 168 and B. amyloliquefaciens. The GCNs offer a framework for exploring transcription factors, their targets, and co-expressed genes during sporulation. Furthermore, the methodology described here can conveniently be applied to other species or biological processes.
Collapse
Affiliation(s)
- Jimmy Omony
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| | - Anne de Jong
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| | - Antonina O Krawczyk
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| | - Robyn T Eijlander
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands.,3NIZO Food Research, B.V., P.O. Box 20, Ede 6710 BA, Ede, The Netherlands
| | - Oscar P Kuipers
- 1Laboratory of Molecular Genetics, University of Groningen, 9747 AG Groningen, The Netherlands.,2Top Institute Food and Nutrition (TIFN), Nieuwe Kanaal 9A, 6709 PA Wageningen, The Netherlands
| |
Collapse
|
78
|
Haynes WA, Tomczak A, Khatri P. Gene annotation bias impedes biomedical research. Sci Rep 2018; 8:1362. [PMID: 29358745 PMCID: PMC5778030 DOI: 10.1038/s41598-018-19333-x] [Citation(s) in RCA: 89] [Impact Index Per Article: 14.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2017] [Accepted: 12/28/2017] [Indexed: 12/21/2022] Open
Abstract
We found tremendous inequality across gene and protein annotation resources. We observed that this bias leads biomedical researchers to focus on richly annotated genes instead of those with the strongest molecular data. We advocate that researchers reduce these biases by pursuing data-driven hypotheses.
Collapse
Affiliation(s)
- Winston A Haynes
- Stanford Institute for Immunity, Transplantation, and Infection, Stanford University, Stanford, California, USA
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, California, USA
- Biomedical Informatics Training Program, Stanford University, Stanford, California, USA
| | - Aurelie Tomczak
- Stanford Institute for Immunity, Transplantation, and Infection, Stanford University, Stanford, California, USA
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, California, USA
| | - Purvesh Khatri
- Stanford Institute for Immunity, Transplantation, and Infection, Stanford University, Stanford, California, USA.
- Stanford Center for Biomedical Informatics Research, Department of Medicine, Stanford University, Stanford, California, USA.
| |
Collapse
|
79
|
Ellens KW, Christian N, Singh C, Satagopam VP, May P, Linster CL. Confronting the catalytic dark matter encoded by sequenced genomes. Nucleic Acids Res 2017; 45:11495-11514. [PMID: 29059321 PMCID: PMC5714238 DOI: 10.1093/nar/gkx937] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 10/03/2017] [Indexed: 01/02/2023] Open
Abstract
The post-genomic era has provided researchers with a deluge of protein sequences. However, a significant fraction of the proteins encoded by sequenced genomes remains without an identified function. Here, we aim at determining how many enzymes of uncertain or unknown function are still present in the Saccharomyces cerevisiae and human proteomes. Using information available in the Swiss-Prot, BRENDA and KEGG databases in combination with a Hidden Markov Model-based method, we estimate that >600 yeast and 2000 human proteins (>30% of their proteins of unknown function) are enzymes whose precise function(s) remain(s) to be determined. This illustrates the impressive scale of the ‘unknown enzyme problem’. We extensively review classical biochemical as well as more recent systematic experimental and computational approaches that can be used to support enzyme function discovery research. Finally, we discuss the possible roles of the elusive catalysts in light of recent developments in the fields of enzymology and metabolism as well as the significance of the unknown enzyme problem in the context of metabolic modeling, metabolic engineering and rare disease research.
Collapse
Affiliation(s)
- Kenneth W Ellens
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Nils Christian
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Charandeep Singh
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Venkata P Satagopam
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Patrick May
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| | - Carole L Linster
- Luxembourg Centre for Systems Biomedicine, University of Luxembourg, L-4362 Esch-sur-Alzette, Luxembourg
| |
Collapse
|
80
|
Patkar S, Magen A, Sharan R, Hannenhalli S. A network diffusion approach to inferring sample-specific function reveals functional changes associated with breast cancer. PLoS Comput Biol 2017; 13:e1005793. [PMID: 29190299 PMCID: PMC5708603 DOI: 10.1371/journal.pcbi.1005793] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 09/27/2017] [Indexed: 11/18/2022] Open
Abstract
Guilt-by-association codifies the empirical observation that a gene's function is informed by its neighborhood in a biological network. This would imply that when a gene's network context is altered, for instance in disease condition, so could be the gene's function. Although context-specific changes in biological networks have been explored, the potential changes they may induce on the functional roles of genes are yet to be characterized. Here we analyze, for the first time, the network-induced potential functional changes in breast cancer. Using transcriptomic samples for 1047 breast tumors and 110 healthy breast tissues from TCGA, we derive sample-specific protein interaction networks and assign sample-specific functions to genes via a diffusion strategy. Testing for significant changes in the inferred functions between normal and cancer samples, we find several functions to have significantly gained or lost genes in cancer, not due to differential expression of genes known to perform the function, but rather due to changes in the network topology. Our predicted functional changes are supported by mutational and copy number profiles in breast cancers. Our diffusion-based functional assignment provides a novel characterization of a tumor that is complementary to the standard approach based on functional annotation alone. Importantly, this characterization is effective in predicting patient survival, as well as in predicting several known histopathological subtypes of breast cancer.
Collapse
Affiliation(s)
- Sushant Patkar
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Assaf Magen
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel
| | - Sridhar Hannenhalli
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland, United States of America
| |
Collapse
|
81
|
Palluzzi F, Ferrari R, Graziano F, Novelli V, Rossi G, Galimberti D, Rainero I, Benussi L, Nacmias B, Bruni AC, Cusi D, Salvi E, Borroni B, Grassi M. A novel network analysis approach reveals DNA damage, oxidative stress and calcium/cAMP homeostasis-associated biomarkers in frontotemporal dementia. PLoS One 2017; 12:e0185797. [PMID: 29020091 PMCID: PMC5636111 DOI: 10.1371/journal.pone.0185797] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/26/2017] [Accepted: 09/19/2017] [Indexed: 01/04/2023] Open
Abstract
Frontotemporal Dementia (FTD) is the form of neurodegenerative dementia with the highest prevalence after Alzheimer’s disease, equally distributed in men and women. It includes several variants, generally characterized by behavioural instability and language impairments. Although few mendelian genes (MAPT, GRN, and C9orf72) have been associated to the FTD phenotype, in most cases there is only evidence of multiple risk loci with relatively small effect size. To date, there are no comprehensive studies describing FTD at molecular level, highlighting possible genetic interactions and signalling pathways at the origin FTD-associated neurodegeneration. In this study, we designed a broad FTD genetic interaction map of the Italian population, through a novel network-based approach modelled on the concepts of disease-relevance and interaction perturbation, combining Steiner tree search and Structural Equation Model (SEM) analysis. Our results show a strong connection between Calcium/cAMP metabolism, oxidative stress-induced Serine/Threonine kinases activation, and postsynaptic membrane potentiation, suggesting a possible combination of neuronal damage and loss of neuroprotection, leading to cell death. In our model, Calcium/cAMP homeostasis and energetic metabolism impairments are primary causes of loss of neuroprotection and neural cell damage, respectively. Secondly, the altered postsynaptic membrane potentiation, due to the activation of stress-induced Serine/Threonine kinases, leads to neurodegeneration. Our study investigates the molecular underpinnings of these processes, evidencing key genes and gene interactions that may account for a significant fraction of unexplained FTD aetiology. We emphasized the key molecular actors in these processes, proposing them as novel FTD biomarkers that could be crucial for further epidemiological and molecular studies.
Collapse
Affiliation(s)
- Fernando Palluzzi
- Department of Brain and Behavioural Sciences, Medical and Genomic Statistics Unit, University of Pavia, Pavia, Italy
- * E-mail:
| | - Raffaele Ferrari
- Department of Molecular Neuroscience, Institute of Neurology, University College London (UCL), London, United Kingdom
| | - Francesca Graziano
- Department of Brain and Behavioural Sciences, Medical and Genomic Statistics Unit, University of Pavia, Pavia, Italy
| | - Valeria Novelli
- Department of Genetics, Fondazione Policlinico A. Gemelli, Roma, Italy
| | - Giacomina Rossi
- Division of Neurology V and Neuropathology, Fondazione IRCCS Istituto Neurologico Carlo Besta, Milano, Italy
| | - Daniela Galimberti
- Department of Neurological Sciences, Dino Ferrari Institute, University of Milan, Milano, Italy
| | - Innocenzo Rainero
- Department of Neuroscience, Neurology I, University of Torino and Città della Salute e della Scienza di Torino, Torino, Italy
| | - Luisa Benussi
- Molecular Markers Laboratory, IRCCS Istituto Centro San Giovanni di Dio Fatebenefratelli, Brescia, Italy
| | - Benedetta Nacmias
- Department of Neuroscience, Psychology, Drug Research and Child Health, University of Florence, Firenze, Italy
| | - Amalia C. Bruni
- Neurogenetic Regional Centre ASPCZ Lamezia Terme, Lamezia Terme (CZ), Italy
| | - Daniele Cusi
- Department of Health Sciences, University of Milan at San Paolo Hospital, Milano, Italy
- Institute of Biomedical Technologies, Italian National Research Council, Milano, Italy
| | - Erika Salvi
- Institute of Biomedical Technologies, Italian National Research Council, Milano, Italy
| | - Barbara Borroni
- Department of Medical Sciences, Neurology Clinic, University of Brescia, Brescia, Italy
| | - Mario Grassi
- Department of Brain and Behavioural Sciences, Medical and Genomic Statistics Unit, University of Pavia, Pavia, Italy
| |
Collapse
|
82
|
Sandor C, Beer NL, Webber C. Diverse type 2 diabetes genetic risk factors functionally converge in a phenotype-focused gene network. PLoS Comput Biol 2017; 13:e1005816. [PMID: 29059180 PMCID: PMC5667928 DOI: 10.1371/journal.pcbi.1005816] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2017] [Revised: 11/02/2017] [Accepted: 10/11/2017] [Indexed: 12/14/2022] Open
Abstract
Type 2 Diabetes (T2D) constitutes a global health burden. Efforts to uncover predisposing genetic variation have been considerable, yet detailed knowledge of the underlying pathogenesis remains poor. Here, we constructed a T2D phenotypic-linkage network (T2D-PLN), by integrating diverse gene functional information that highlight genes, which when disrupted in mice, elicit similar T2D-relevant phenotypes. Sensitising the network to T2D-relevant phenotypes enabled significant functional convergence to be detected between genes implicated in monogenic or syndromic diabetes and genes lying within genomic regions associated with T2D common risk. We extended these analyses to a recent multiethnic T2D case-control exome of 12,940 individuals that found no evidence of T2D risk association for rare frequency variants outside of previously known T2D risk loci. Examining associations involving protein-truncating variants (PTV), most at low population frequencies, the T2D-PLN was able to identify a convergent set of biological pathways that were perturbed within four of five independent T2D case/control ethnic sets of 2000 to 5000 exomes each. These same pathways were found to be over-represented among both known monogenic or syndromic diabetes genes and genes within T2D-associated common risk loci. Our study demonstrates convergent biology amongst variants representing different classes of T2D genetic risk. Although convergence was observed at the pathway level, few of the contributing genes were found in common between different cohorts or variant classes, most notably between the exome variant sets which suggests that future rare variant studies may be better focusing their power onto a single population of recent common ancestry.
Collapse
Affiliation(s)
- Cynthia Sandor
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| | - Nicola L. Beer
- Oxford Centre for Diabetes, Endocrinology and Metabolism, Radcliffe Department of Medicine, University of Oxford, Oxford, United Kingdom
| | - Caleb Webber
- Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford, United Kingdom
| |
Collapse
|
83
|
Teng Z, Guo M, Liu X, Tian Z, Che K. Revealing protein functions based on relationships of interacting proteins and GO terms. J Biomed Semantics 2017; 8:27. [PMID: 29297388 PMCID: PMC5763294 DOI: 10.1186/s13326-017-0139-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
BACKGROUND In recent years, numerous computational methods predicted protein function based on the protein-protein interaction (PPI) network. These methods supposed that two proteins share the same function if they interact with each other. However, it is reported by recent studies that the functions of two interacting proteins may be just related. It will mislead the prediction of protein function. Therefore, there is a need for investigating the functional relationship between interacting proteins. RESULTS In this paper, the functional relationship between interacting proteins is studied and a novel method, called as GoDIN, is advanced to annotate functions of interacting proteins in Gene Ontology (GO) context. It is assumed that the functional difference between interacting proteins can be expressed by semantic difference between GO term and its relatives. Thus, the method uses GO term and its relatives to annotate the interacting proteins separately according to their functional roles in the PPI network. The method is validated by a series of experiments and compared with the concerned method. The experimental results confirm the assumption and suggest that GoDIN is effective on predicting functions of protein. CONCLUSIONS This study demonstrates that: (1) interacting proteins are not equal in the PPI network, and their function may be same or similar, or just related; (2) functional difference between interacting proteins can be measured by their degrees in the PPI network; (3) functional relationship between interacting proteins can be expressed by relationship between GO term and its relatives.
Collapse
Affiliation(s)
- Zhixia Teng
- Department of Information Management and Information System, Northeast Forestry University, Harbin, 150040, China.
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China.
| | - Maozu Guo
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China.
| | - Xiaoyan Liu
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Zhen Tian
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| | - Kai Che
- Department of Computer Science and Engineering, Harbin Institute of Technology, Harbin, 150001, China
| |
Collapse
|
84
|
Emad A, Cairns J, Kalari KR, Wang L, Sinha S. Knowledge-guided gene prioritization reveals new insights into the mechanisms of chemoresistance. Genome Biol 2017; 18:153. [PMID: 28800781 PMCID: PMC5554409 DOI: 10.1186/s13059-017-1282-3] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2017] [Accepted: 07/18/2017] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Identification of genes whose basal mRNA expression predicts the sensitivity of tumor cells to cytotoxic treatments can play an important role in individualized cancer medicine. It enables detailed characterization of the mechanism of action of drugs. Furthermore, screening the expression of these genes in the tumor tissue may suggest the best course of chemotherapy or a combination of drugs to overcome drug resistance. RESULTS We developed a computational method called ProGENI to identify genes most associated with the variation of drug response across different individuals, based on gene expression data. In contrast to existing methods, ProGENI also utilizes prior knowledge of protein-protein and genetic interactions, using random walk techniques. Analysis of two relatively new and large datasets including gene expression data on hundreds of cell lines and their cytotoxic responses to a large compendium of drugs reveals a significant improvement in prediction of drug sensitivity using genes identified by ProGENI compared to other methods. Our siRNA knockdown experiments on ProGENI-identified genes confirmed the role of many new genes in sensitivity to three chemotherapy drugs: cisplatin, docetaxel, and doxorubicin. Based on such experiments and extensive literature survey, we demonstrate that about 73% of our top predicted genes modulate drug response in selected cancer cell lines. In addition, global analysis of genes associated with groups of drugs uncovered pathways of cytotoxic response shared by each group. CONCLUSIONS Our results suggest that knowledge-guided prioritization of genes using ProGENI gives new insight into mechanisms of drug resistance and identifies genes that may be targeted to overcome this phenomenon.
Collapse
Affiliation(s)
- Amin Emad
- Carl R. Woese Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801 USA
| | - Junmei Cairns
- Department of Molecular Pharmacology and Experimental Therapeutics, Gonda 19, Mayo Clinic Rochester, 200, 1st St. SW, Rochester, MN 55905 USA
| | - Krishna R. Kalari
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905 USA
| | - Liewei Wang
- Department of Molecular Pharmacology and Experimental Therapeutics, Gonda 19, Mayo Clinic Rochester, 200, 1st St. SW, Rochester, MN 55905 USA
| | - Saurabh Sinha
- Department of Computer Science and Institute of Genomic Biology, University of Illinois at Urbana-Champaign, 2122 Siebel Center, 201N. Goodwin Ave, Urbana, IL 61801 USA
| |
Collapse
|
85
|
Fortelny N, Butler GS, Overall CM, Pavlidis P. Protease-Inhibitor Interaction Predictions: Lessons on the Complexity of Protein-Protein Interactions. Mol Cell Proteomics 2017; 16:1038-1051. [PMID: 28385878 PMCID: PMC5461536 DOI: 10.1074/mcp.m116.065706] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 03/24/2017] [Indexed: 01/18/2023] Open
Abstract
Protein interactions shape proteome function and thus biology. Identification of protein interactions is a major goal in molecular biology, but biochemical methods, although improving, remain limited in coverage and accuracy. Whereas computational predictions can guide biochemical experiments, low validation rates of predictions remain a major limitation. Here, we investigated computational methods in the prediction of a specific type of interaction, the inhibitory interactions between proteases and their inhibitors. Proteases generate thousands of proteoforms that dynamically shape the functional state of proteomes. Despite the important regulatory role of proteases, knowledge of their inhibitors remains largely incomplete with the vast majority of proteases lacking an annotated inhibitor. To link inhibitors to their target proteases on a large scale, we applied computational methods to predict inhibitory interactions between proteases and their inhibitors based on complementary data, including coexpression, phylogenetic similarity, structural information, co-annotation, and colocalization, and also surveyed general protein interaction networks for potential inhibitory interactions. In testing nine predicted interactions biochemically, we validated the inhibition of kallikrein 5 by serpin B12. Despite the use of a wide array of complementary data, we found a high false positive rate of computational predictions in biochemical follow-up. Based on a protease-specific definition of true negatives derived from the biochemical classification of proteases and inhibitors, we analyzed prediction accuracy of individual features, thereby we identified feature-specific limitations, which also affected general protein interaction prediction methods. Interestingly, proteases were often not coexpressed with most of their functional inhibitors, contrary to what is commonly assumed and extrapolated predominantly from cell culture experiments. Predictions of inhibitory interactions were indeed more challenging than predictions of nonproteolytic and noninhibitory interactions. In summary, we describe a novel and well-defined but difficult protein interaction prediction task and thereby highlight limitations of computational interaction prediction methods.
Collapse
Affiliation(s)
- Nikolaus Fortelny
- From the ‡Department of Biochemistry and Molecular Biology
- §Michael Smith Laboratories
- ¶Centre for Blood Research
| | - Georgina S Butler
- ¶Centre for Blood Research
- ‖Department of Oral Biological and Medical Sciences, Faculty of Dentistry
| | - Christopher M Overall
- From the ‡Department of Biochemistry and Molecular Biology
- ¶Centre for Blood Research
- ‖Department of Oral Biological and Medical Sciences, Faculty of Dentistry
| | - Paul Pavlidis
- §Michael Smith Laboratories;
- **Department of Psychiatry, University of British Columbia, Vancouver, British Columbia, Canada
| |
Collapse
|
86
|
Kominakis A, Hager-Theodorides AL, Zoidis E, Saridaki A, Antonakos G, Tsiamis G. Combined GWAS and 'guilt by association'-based prioritization analysis identifies functional candidate genes for body size in sheep. Genet Sel Evol 2017; 49:41. [PMID: 28454565 PMCID: PMC5408376 DOI: 10.1186/s12711-017-0316-3] [Citation(s) in RCA: 52] [Impact Index Per Article: 7.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2016] [Accepted: 04/19/2017] [Indexed: 12/30/2022] Open
Abstract
Background Body size in sheep is an important indicator of productivity, growth and health as well as of environmental adaptation. It is a composite quantitative trait that has been studied with high-throughput genomic methods, i.e. genome-wide association studies (GWAS) in various mammalian species. Several genomic markers have been associated with body size traits and genes have been identified as causative candidates in humans, dog and cattle. A limited number of related GWAS have been performed in various sheep breeds and have identified genomic regions and candidate genes that partly account for body size variability. Here, we conducted a GWAS in Frizarta dairy sheep with phenotypic data from 10 body size measurements and genotypic data (from Illumina ovineSNP50 BeadChip) for 459 ewes. Results The 10 body size measurements were subjected to principal component analysis and three independent principal components (PC) were constructed, interpretable as width, height and length dimensions, respectively. The GWAS performed for each PC identified 11 significant SNPs, at the chromosome level, one on each of the chromosomes 3, 8, 9, 10, 11, 12, 19, 20, 23 and two on chromosome 25. Nine out of the 11 SNPs were located on previously identified quantitative trait loci for sheep meat, production or reproduction. One hundred and ninety-seven positional candidate genes within a 1-Mb distance from each significant SNP were found. A guilt-by-association-based (GBA) prioritization analysis (PA) was performed to identify the most plausible functional candidate genes. GBA-based PA identified 39 genes that were significantly associated with gene networks relevant to body size traits. Prioritized genes were identified in the vicinity of all significant SNPs except for those on chromosomes 10 and 12. The top five ranking genes were TP53, BMPR1A, PIK3R5, RPL26 and PRKDC. Conclusions The results of this GWAS provide evidence for 39 causative candidate genes across nine chromosomal regions for body size traits, some of which are novel and some are previously identified candidates from other studies (e.g. TP53, NTN1 and ZNF521). GBA-based PA has proved to be a useful tool to identify genes with increased biological relevance but it is subjected to certain limitations. Electronic supplementary material The online version of this article (doi:10.1186/s12711-017-0316-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Antonios Kominakis
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece
| | - Ariadne L Hager-Theodorides
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece.
| | - Evangelos Zoidis
- Department of Animal Science and Aquaculture, Agricultural University of Athens, Iera Odos 75, 11855, Athens, Greece
| | - Aggeliki Saridaki
- Department of Environmental and Natural Resources Management, University of Patras, Seferi 2, 30100, Agrinio, Greece
| | - George Antonakos
- Agricultural and Livestock Union of Western Greece, 13rd Km N.R. Agrinio-Ioannina, 30100, Lepenou, Greece
| | - George Tsiamis
- Department of Environmental and Natural Resources Management, University of Patras, Seferi 2, 30100, Agrinio, Greece
| |
Collapse
|
87
|
Feng BJ. PERCH: A Unified Framework for Disease Gene Prioritization. Hum Mutat 2017; 38:243-251. [PMID: 27995669 PMCID: PMC5299048 DOI: 10.1002/humu.23158] [Citation(s) in RCA: 110] [Impact Index Per Article: 15.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2016] [Accepted: 12/12/2016] [Indexed: 12/30/2022]
Abstract
To interpret genetic variants discovered from next-generation sequencing, integration of heterogeneous information is vital for success. This article describes a framework named PERCH (Polymorphism Evaluation, Ranking, and Classification for a Heritable trait), available at http://BJFengLab.org/. It can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare-variant association test, and a converted variant call quality score. It supports data that contain various combinations of extended pedigrees, trios, and case-controls, and allows for a reduced penetrance, an elevated phenocopy rate, liability classes, and covariates. BayesDel is more accurate than PolyPhen2, SIFT, FATHMM, LRT, Mutation Taster, Mutation Assessor, PhyloP, GERP++, SiPhy, CADD, MetaLR, and MetaSVM. The overall approach is faster and more powerful than the existing quantitative method pVAAST, as shown by the simulations of challenging situations in finding the missing heritability of a complex disease. This framework can also classify variants of unknown significance (variants of uncertain significance) by quantitatively integrating allele frequencies, deleteriousness, association, and co-segregation. PERCH is a versatile tool for gene prioritization in gene discovery research and variant classification in clinical genetic testing.
Collapse
Affiliation(s)
- Bing-Jian Feng
- Department of Dermatology, University of Utah, Salt Lake City, UT 84132, USA
- Huntsman Cancer Institute, University of Utah, Salt Lake City, UT 84132, USA
| |
Collapse
|
88
|
Jiang B, Kloster K, Gleich DF, Gribskov M. AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs. Bioinformatics 2017; 33:1829-1836. [PMID: 28200073 DOI: 10.1093/bioinformatics/btx029] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 02/14/2017] [Indexed: 11/15/2022] Open
Affiliation(s)
- Biaobin Jiang
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Kyle Kloster
- Department of Mathematics, Purdue University, West Lafayette, IN, USA
| | - David F Gleich
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
89
|
Kulmanov M, Hoehndorf R. Evaluating the effect of annotation size on measures of semantic similarity. J Biomed Semantics 2017; 8:7. [PMID: 28193260 PMCID: PMC5307803 DOI: 10.1186/s13326-017-0119-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2016] [Accepted: 02/01/2017] [Indexed: 01/29/2023] Open
Abstract
Background Ontologies are widely used as metadata in biological and biomedical datasets. Measures of semantic similarity utilize ontologies to determine how similar two entities annotated with classes from ontologies are, and semantic similarity is increasingly applied in applications ranging from diagnosis of disease to investigation in gene networks and functions of gene products. Results Here, we analyze a large number of semantic similarity measures and the sensitivity of similarity values to the number of annotations of entities, difference in annotation size and to the depth or specificity of annotation classes. We find that most similarity measures are sensitive to the number of annotations of entities, difference in annotation size as well as to the depth of annotation classes; well-studied and richly annotated entities will usually show higher similarity than entities with only few annotations even in the absence of any biological relation. Conclusions Our findings may have significant impact on the interpretation of results that rely on measures of semantic similarity, and we demonstrate how the sensitivity to annotation size can lead to a bias when using semantic similarity to predict protein-protein interactions. Electronic supplementary material The online version of this article (doi:10.1186/s13326-017-0119-z) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Maxat Kulmanov
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia. .,Computer, Electrical and Mathematical Sciences and Engineering Division, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
| |
Collapse
|
90
|
Wong DCJ, Matus JT. Constructing Integrated Networks for Identifying New Secondary Metabolic Pathway Regulators in Grapevine: Recent Applications and Future Opportunities. FRONTIERS IN PLANT SCIENCE 2017; 8:505. [PMID: 28446914 PMCID: PMC5388765 DOI: 10.3389/fpls.2017.00505] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/16/2017] [Accepted: 03/22/2017] [Indexed: 05/19/2023]
Abstract
Representing large biological data as networks is becoming increasingly adopted for predicting gene function while elucidating the multifaceted organization of life processes. In grapevine (Vitis vinifera L.), network analyses have been mostly adopted to contribute to the understanding of the regulatory mechanisms that control berry composition. Whereas, some studies have used gene co-expression networks to find common pathways and putative targets for transcription factors related to development and metabolism, others have defined networks of primary and secondary metabolites for characterizing the main metabolic differences between cultivars throughout fruit ripening. Lately, proteomic-related networks and those integrating genome-wide analyses of promoter regulatory elements have also been generated. The integration of all these data in multilayered networks allows building complex maps of molecular regulation and interaction. This perspective article describes the currently available network data and related resources for grapevine. With the aim of illustrating data integration approaches into network construction and analysis in grapevine, we searched for berry-specific regulators of the phenylpropanoid pathway. We generated a composite network consisting of overlaying maps of co-expression between structural and transcription factor genes, integrated with the presence of promoter cis-binding elements, microRNAs, and long non-coding RNAs (lncRNA). This approach revealed new uncharacterized transcription factors together with several microRNAs potentially regulating different steps of the phenylpropanoid pathway, and one particular lncRNA compromising the expression of nine stilbene synthase (STS) genes located in chromosome 10. Application of network-based approaches into multi-omics data will continue providing supplementary resources to address important questions regarding grapevine fruit quality and composition.
Collapse
Affiliation(s)
- Darren C. J. Wong
- Ecology and Evolution, Research School of Biology, Australian National UniversityActon, ACT, Australia
| | - José Tomás Matus
- Centre for Research in Agricultural Genomics, CSIC-IRTA-UAB-UBBarcelona, Spain
- *Correspondence: José Tomás Matus
| |
Collapse
|
91
|
Bartley GE, Avena-Bustillos RJ, Du WX, Hidalgo M, Cain B, Breksa AP. Transcriptional regulation of chlorogenic acid biosynthesis in carrot root slices exposed to UV-B light. ACTA ACUST UNITED AC 2016. [DOI: 10.1016/j.plgene.2016.07.001] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/29/2023]
|
92
|
McClure RS, Overall CC, McDermott JE, Hill EA, Markillie LM, McCue LA, Taylor RC, Ludwig M, Bryant DA, Beliaev AS. Network analysis of transcriptomics expands regulatory landscapes in Synechococcus sp. PCC 7002. Nucleic Acids Res 2016; 44:8810-8825. [PMID: 27568004 PMCID: PMC5062996 DOI: 10.1093/nar/gkw737] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2015] [Accepted: 08/05/2016] [Indexed: 12/29/2022] Open
Abstract
Cyanobacterial regulation of gene expression must contend with a genome organization that lacks apparent functional context, as the majority of cellular processes and metabolic pathways are encoded by genes found at disparate locations across the genome and relatively few transcription factors exist. In this study, global transcript abundance data from the model cyanobacterium Synechococcus sp. PCC 7002 grown under 42 different conditions was analyzed using Context-Likelihood of Relatedness (CLR). The resulting network, organized into 11 modules, provided insight into transcriptional network topology as well as grouping genes by function and linking their response to specific environmental variables. When used in conjunction with genome sequences, the network allowed identification and expansion of novel potential targets of both DNA binding proteins and sRNA regulators. These results offer a new perspective into the multi-level regulation that governs cellular adaptations of the fast-growing physiologically robust cyanobacterium Synechococcus sp. PCC 7002 to changing environmental variables. It also provides a methodological high-throughput approach to studying multi-scale regulatory mechanisms that operate in cyanobacteria. Finally, it provides valuable context for integrating systems-level data to enhance gene grouping based on annotated function, especially in organisms where traditional context analyses cannot be implemented due to lack of operon-based functional organization.
Collapse
Affiliation(s)
- Ryan S McClure
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Christopher C Overall
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Jason E McDermott
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Eric A Hill
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Lye Meng Markillie
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Lee Ann McCue
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Ronald C Taylor
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| | - Marcus Ludwig
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA 16802, USA
| | - Donald A Bryant
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, State College, PA 16802, USA Department of Chemistry and Biochemistry, Montana State University, Bozeman, MT 59717, USA
| | - Alexander S Beliaev
- Biological Sciences Division, Pacific Northwest National Laboratory, Richland, WA 99352, USA
| |
Collapse
|
93
|
O’Meara MJ, Ballouz S, Shoichet BK, Gillis J. Ligand Similarity Complements Sequence, Physical Interaction, and Co-Expression for Gene Function Prediction. PLoS One 2016; 11:e0160098. [PMID: 27467773 PMCID: PMC4965129 DOI: 10.1371/journal.pone.0160098] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2016] [Accepted: 07/13/2016] [Indexed: 12/13/2022] Open
Abstract
The expansion of protein-ligand annotation databases has enabled large-scale networking of proteins by ligand similarity. These ligand-based protein networks, which implicitly predict the ability of neighboring proteins to bind related ligands, may complement biologically-oriented gene networks, which are used to predict functional or disease relevance. To quantify the degree to which such ligand-based protein associations might complement functional genomic associations, including sequence similarity, physical protein-protein interactions, co-expression, and disease gene annotations, we calculated a network based on the Similarity Ensemble Approach (SEA: sea.docking.org), where protein neighbors reflect the similarity of their ligands. We also measured the similarity with functional genomic networks over a common set of 1,131 genes, and found that the networks had only small overlaps, which were significant only due to the large scale of the data. Consistent with the view that the networks contain different information, combining them substantially improved Molecular Function prediction within GO (from AUROC~0.63–0.75 for the individual data modalities to AUROC~0.8 in the aggregate). We investigated the boost in guilt-by-association gene function prediction when the networks are combined and describe underlying properties that can be further exploited.
Collapse
Affiliation(s)
- Matthew J. O’Meara
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, 94158–2550, United States of America
| | - Sara Ballouz
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, 500 Sunnyside Boulevard, Woodbury, NY, 11797, United States of America
| | - Brian K. Shoichet
- Department of Pharmaceutical Chemistry, University of California San Francisco, San Francisco, California, 94158–2550, United States of America
- * E-mail: (BKS); (JG)
| | - Jesse Gillis
- Cold Spring Harbor Laboratory, Stanley Institute for Cognitive Genomics, 500 Sunnyside Boulevard, Woodbury, NY, 11797, United States of America
- * E-mail: (BKS); (JG)
| |
Collapse
|
94
|
Yue J, Xu W, Ban R, Huang S, Miao M, Tang X, Liu G, Liu Y. PTIR: Predicted Tomato Interactome Resource. Sci Rep 2016; 6:25047. [PMID: 27121261 PMCID: PMC4848565 DOI: 10.1038/srep25047] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2015] [Accepted: 04/08/2016] [Indexed: 01/18/2023] Open
Abstract
Protein-protein interactions (PPIs) are involved in almost all biological processes and form the basis of the entire interactomics systems of living organisms. Identification and characterization of these interactions are fundamental to elucidating the molecular mechanisms of signal transduction and metabolic pathways at both the cellular and systemic levels. Although a number of experimental and computational studies have been performed on model organisms, the studies exploring and investigating PPIs in tomatoes remain lacking. Here, we developed a Predicted Tomato Interactome Resource (PTIR), based on experimentally determined orthologous interactions in six model organisms. The reliability of individual PPIs was also evaluated by shared gene ontology (GO) terms, co-evolution, co-expression, co-localization and available domain-domain interactions (DDIs). Currently, the PTIR covers 357,946 non-redundant PPIs among 10,626 proteins, including 12,291 high-confidence, 226,553 medium-confidence, and 119,102 low-confidence interactions. These interactions are expected to cover 30.6% of the entire tomato proteome and possess a reasonable distribution. In addition, ten randomly selected PPIs were verified using yeast two-hybrid (Y2H) screening or a bimolecular fluorescence complementation (BiFC) assay. The PTIR was constructed and implemented as a dedicated database and is available at http://bdg.hfut.edu.cn/ptir/index.html without registration.
Collapse
Affiliation(s)
- Junyang Yue
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
| | - Wei Xu
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
| | - Rongjun Ban
- School of Information Science and Technology, University of Science and Technology of China, Hefei 230026, China
| | - Shengxiong Huang
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
| | - Min Miao
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
| | - Xiaofeng Tang
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
| | - Guoqing Liu
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
| | - Yongsheng Liu
- School of Biotechnology and Food Engineering, Hefei University of Technology, Hefei 230009, China
- Ministry of Education Key Laboratory for Bio-resource and Eco-environment, College of Life Science, State Key Laboratory of Hydraulics and Mountain River Engineering, Sichuan University, Chengdu 610064, China
| |
Collapse
|
95
|
Wang L, Zhang C, Watkins J, Jin Y, McNutt M, Yin Y. SoftPanel: a website for grouping diseases and related disorders for generation of customized panels. BMC Bioinformatics 2016; 17:153. [PMID: 27044653 PMCID: PMC4820874 DOI: 10.1186/s12859-016-0998-5] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Accepted: 03/23/2016] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND Targeted next-generation sequencing is playing an increasingly important role in biological research and clinical diagnosis by allowing researchers to sequence high priority genes at much higher depths and at a fraction of the cost of whole genome or exome sequencing. However, in designing the panel of genes to be sequenced, investigators need to consider the tradeoff between the better sensitivity of a broad panel and the higher specificity of a potentially more relevant panel. Although tools to prioritize candidate disease genes have been developed, the great majority of these require prior knowledge and a set of seed genes as input, which is only possible for diseases with a known genetic etiology. RESULTS To meet the demands of both researchers and clinicians, we have developed a user-friendly website called SoftPanel. This website is intended to serve users by allowing them to input a single disorder or a disorder group and generate a panel of genes predicted to underlie the disorder of interest. Various methods of retrieval including a keyword search, browsing of an arborized list of International Classification of Diseases, 10th revision (ICD-10) codes or using disorder phenotypic similarities can be combined to define a group of disorders and the genes known to be associated with them. Moreover, SoftPanel enables users to expand or refine a gene list by utilizing several biological data resources. In addition to providing users with the facility to create a "hard" panel that contains an exact gene list for targeted sequencing, SoftPanel also enables generation of a "soft" panel of genes, which may be used to further filter a significantly altered set of genes identified through whole genome or whole exome sequencing. The service and data provided by SoftPanel can be accessed at http://www.isb.pku.edu.cn/SoftPanel/ . A tutorial page is included for trying out sample data and interpreting results. CONCLUSION SoftPanel provides a convenient and powerful tool for creating a targeted panel of potential disease genes while supporting different forms of input. SoftPanel may be utilized in both genomics research and personalized medicine.
Collapse
Affiliation(s)
- Likun Wang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Cong Zhang
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Johnathan Watkins
- Institute for Mathematical and Molecular Biomedicine, King's College London, Guy's Campus, London, SE1 1UL, UK.,Department of Research Oncology, King's College London, Guy's Campus, Great Maze Pond, London, SE1 9RT, UK
| | - Yan Jin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Michael McNutt
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China
| | - Yuxin Yin
- Institute of Systems Biomedicine, Department of Pathology, School of Basic Medical Sciences, Beijing Key Laboratory of Tumor Systems Biology, Peking-Tsinghua Center for Life Sciences, Peking University Health Science Center, Beijing, 100191, China.
| |
Collapse
|
96
|
Yu MK, Kramer M, Dutkowski J, Srivas R, Licon K, Kreisberg J, Ng CT, Krogan N, Sharan R, Ideker T. Translation of Genotype to Phenotype by a Hierarchy of Cell Subsystems. Cell Syst 2016; 2:77-88. [PMID: 26949740 PMCID: PMC4772745 DOI: 10.1016/j.cels.2016.02.003] [Citation(s) in RCA: 51] [Impact Index Per Article: 6.4] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/26/2022]
Abstract
Accurately translating genotype to phenotype requires accounting for the functional impact of genetic variation at many biological scales. Here we present a strategy for genotype-phenotype reasoning based on existing knowledge of cellular subsystems. These subsystems and their hierarchical organization are defined by the Gene Ontology or a complementary ontology inferred directly from previously published datasets. Guided by the ontology's hierarchical structure, we organize genotype data into an "ontotype," that is, a hierarchy of perturbations representing the effects of genetic variation at multiple cellular scales. The ontotype is then interpreted using logical rules generated by machine learning to predict phenotype. This approach substantially outperforms previous, non-hierarchical methods for translating yeast genotype to cell growth phenotype, and it accurately predicts the growth outcomes of two new screens of 2,503 double gene knockouts impacting DNA repair or nuclear lumen. Ontotypes also generalize to larger knockout combinations, setting the stage for interpreting the complex genetics of disease.
Collapse
Affiliation(s)
- Michael Ku Yu
- Bioinformatics and Systems Biology Program, University of California San Diego, La Jolla CA 92093, USA
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | - Michael Kramer
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Biomedical Sciences Program, University of California San Diego, La Jolla CA 92093, USA
| | - Janusz Dutkowski
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Data4Cure, La Jolla, CA 92037, USA
| | - Rohith Srivas
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
- Department of Bioengineering, University of California San Diego, La Jolla CA 92093, USA
| | - Katherine Licon
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | - Jason Kreisberg
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| | | | - Nevan Krogan
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, San Francisco 94143, USA
| | - Roded Sharan
- Blavatnik School of Computer Science, Tel-Aviv University, Tel Aviv 69978, Israel
| | - Trey Ideker
- Department of Medicine, University of California San Diego, La Jolla CA 92093, USA
| |
Collapse
|
97
|
Al-Harazi O, Al Insaif S, Al-Ajlan MA, Kaya N, Dzimiri N, Colak D. Integrated Genomic and Network-Based Analyses of Complex Diseases and Human Disease Network. J Genet Genomics 2015; 43:349-67. [PMID: 27318646 DOI: 10.1016/j.jgg.2015.11.002] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Revised: 10/22/2015] [Accepted: 11/20/2015] [Indexed: 12/16/2022]
Abstract
A disease phenotype generally reflects various pathobiological processes that interact in a complex network. The highly interconnected nature of the human protein interaction network (interactome) indicates that, at the molecular level, it is difficult to consider diseases as being independent of one another. Recently, genome-wide molecular measurements, data mining and bioinformatics approaches have provided the means to explore human diseases from a molecular basis. The exploration of diseases and a system of disease relationships based on the integration of genome-wide molecular data with the human interactome could offer a powerful perspective for understanding the molecular architecture of diseases. Recently, subnetwork markers have proven to be more robust and reliable than individual biomarker genes selected based on gene expression profiles alone, and achieve higher accuracy in disease classification. We have applied one of these methodologies to idiopathic dilated cardiomyopathy (IDCM) data that we have generated using a microarray and identified significant subnetworks associated with the disease. In this paper, we review the recent endeavours in this direction, and summarize the existing methodologies and computational tools for network-based analysis of complex diseases and molecular relationships among apparently different disorders and human disease network. We also discuss the future research trends and topics of this promising field.
Collapse
Affiliation(s)
- Olfat Al-Harazi
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Sadiq Al Insaif
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Monirah A Al-Ajlan
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia; College of Computer and Information Sciences, King Saud University, Riyadh 11451, Saudi Arabia
| | - Namik Kaya
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Nduna Dzimiri
- Department of Genetics, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia
| | - Dilek Colak
- Department of Biostatistics, Epidemiology and Scientific Computing, King Faisal Specialist Hospital and Research Centre, Riyadh 11211, Saudi Arabia.
| |
Collapse
|
98
|
An integrated network of Arabidopsis growth regulators and its use for gene prioritization. Sci Rep 2015; 5:17617. [PMID: 26620795 PMCID: PMC4664945 DOI: 10.1038/srep17617] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2015] [Accepted: 11/03/2015] [Indexed: 11/09/2022] Open
Abstract
Elucidating the molecular mechanisms that govern plant growth has been an important topic in plant research, and current advances in large-scale data generation call for computational tools that efficiently combine these different data sources to generate novel hypotheses. In this work, we present a novel, integrated network that combines multiple large-scale data sources to characterize growth regulatory genes in Arabidopsis, one of the main plant model organisms. The contributions of this work are twofold: first, we characterized a set of carefully selected growth regulators with respect to their connectivity patterns in the integrated network, and, subsequently, we explored to which extent these connectivity patterns can be used to suggest new growth regulators. Using a large-scale comparative study, we designed new supervised machine learning methods to prioritize growth regulators. Our results show that these methods significantly improve current state-of-the-art prioritization techniques, and are able to suggest meaningful new growth regulators. In addition, the integrated network is made available to the scientific community, providing a rich data source that will be useful for many biological processes, not necessarily restricted to plant growth.
Collapse
|
99
|
Pritykin Y, Ghersi D, Singh M. Genome-Wide Detection and Analysis of Multifunctional Genes. PLoS Comput Biol 2015; 11:e1004467. [PMID: 26436655 PMCID: PMC4593560 DOI: 10.1371/journal.pcbi.1004467] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 07/19/2015] [Indexed: 12/25/2022] Open
Abstract
Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. Almost every aspect of cellular function depends on protein activity. In spite of being fine-tuned to carry out highly specific functions, proteins can also multitask. Experimental studies have identified genes and proteins endowed with more than one molecular function, or participating in very different biological processes. These studies suggest that the degree of functional plasticity exhibited by proteins might go well beyond a simple “one protein—one function” relationship. However, systematic studies of the properties of multifunctional genes (and their encoded proteins) have been limited. Here we present a computational framework to identify putative multifunctional genes, and compare their properties with those of other genes. We find that multifunctional genes are significantly different from other genes with respect to their physicochemical properties, expression profiles, and interaction properties. We also observe that multifunctional genes tend to be more conserved, and that a greater fraction of them are associated with human disorders. Taken together, these results represent a step towards a more complete understanding of the role multifunctional genes play in the functional organization of the cell.
Collapse
Affiliation(s)
- Yuri Pritykin
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis–Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
| | - Dario Ghersi
- Lewis–Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- School of Interdisciplinary Informatics, University of Nebraska at Omaha, Omaha, Nebraska, United States of America
- * E-mail: (DG); (MS)
| | - Mona Singh
- Department of Computer Science, Princeton University, Princeton, New Jersey, United States of America
- Lewis–Sigler Institute for Integrative Genomics, Princeton University, Princeton, New Jersey, United States of America
- * E-mail: (DG); (MS)
| |
Collapse
|
100
|
Li W, Espinal-Enríquez J, Simpfendorfer KR, Hernández-Lemus E. A survey of disease connections for CD4+ T cell master genes and their directly linked genes. Comput Biol Chem 2015; 59 Pt B:78-90. [PMID: 26411796 DOI: 10.1016/j.compbiolchem.2015.08.009] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2015] [Revised: 08/18/2015] [Accepted: 08/21/2015] [Indexed: 02/07/2023]
Abstract
Genome-wide association studies and other genetic analyses have identified a large number of genes and variants implicating a variety of disease etiological mechanisms. It is imperative for the study of human diseases to put these genetic findings into a coherent functional context. Here we use system biology tools to examine disease connections of five master genes for CD4+ T cell subtypes (TBX21, GATA3, RORC, BCL6, and FOXP3). We compiled a list of genes functionally interacting (protein-protein interaction, or by acting in the same pathway) with the master genes, then we surveyed the disease connections, either by experimental evidence or by genetic association. Embryonic lethal genes (also known as essential genes) are over-represented in master genes and their interacting genes (55% versus 40% in other genes). Transcription factors are significantly enriched among genes interacting with the master genes (63% versus 10% in other genes). Predicted haploinsufficiency is a feature of most these genes. Disease-connected genes are enriched in this list of genes: 42% of these genes have a disease connection according to Online Mendelian Inheritance in Man (OMIM) (versus 23% in other genes), and 74% are associated with some diseases or phenotype in a Genome Wide Association Study (GWAS) (versus 43% in other genes). Seemingly, not all of the diseases connected to genes surveyed were immune related, which may indicate pleiotropic functions of the master regulator genes and associated genes.
Collapse
Affiliation(s)
- Wentian Li
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA.
| | - Jesús Espinal-Enríquez
- Computational Genomics Department, National Institute of Genomic Medicine, México, D.F., Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de México, México, D.F., Mexico
| | - Kim R Simpfendorfer
- The Robert S. Boas Center for Genomics and Human Genetics, The Feinstein Institute for Medical Research, North Shore LIJ Health System, Manhasset, NY, USA
| | - Enrique Hernández-Lemus
- Computational Genomics Department, National Institute of Genomic Medicine, México, D.F., Mexico; Complexity in Systems Biology, Center for Complexity Sciences, Universidad Nacional Autónoma de México, México, D.F., Mexico
| |
Collapse
|