1
|
Hanna EM, El Hasbani G, Azar D. Ant colony optimization for the identification of dysregulated gene subnetworks from expression data. BMC Bioinformatics 2024; 25:254. [PMID: 39090538 PMCID: PMC11295523 DOI: 10.1186/s12859-024-05871-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/18/2023] [Accepted: 07/12/2024] [Indexed: 08/04/2024] Open
Abstract
BACKGROUND High-throughput experimental technologies can provide deeper insights into pathway perturbations in biomedical studies. Accordingly, their usage is central to the identification of molecular targets and the subsequent development of suitable treatments for various diseases. Classical interpretations of generated data, such as differential gene expression and pathway analyses, disregard interconnections between studied genes when looking for gene-disease associations. Given that these interconnections are central to cellular processes, there has been a recent interest in incorporating them in such studies. The latter allows the detection of gene modules that underlie complex phenotypes in gene interaction networks. Existing methods either impose radius-based restrictions or freely grow modules at the expense of a statistical bias towards large modules. We propose a heuristic method, inspired by Ant Colony Optimization, to apply gene-level scoring and module identification with distance-based search constraints and penalties, rather than radius-based constraints. RESULTS We test and compare our results to other approaches using three datasets of different neurodegenerative diseases, namely Alzheimer's, Parkinson's, and Huntington's, over three independent experiments. We report the outcomes of enrichment analyses and concordance of gene-level scores for each disease. Results indicate that the proposed approach generally shows superior stability in comparison to existing methods. It produces stable and meaningful enrichment results in all three datasets which have different case to control proportions and sample sizes. CONCLUSION The presented network-based gene expression analysis approach successfully identifies dysregulated gene modules associated with a certain disease. Using a heuristic based on Ant Colony Optimization, we perform a distance-based search with no radius constraints. Experimental results support the effectiveness and stability of our method in prioritizing modules of high relevance. Our tool is publicly available at github.com/GhadiElHasbani/ACOxGS.git.
Collapse
Affiliation(s)
- Eileen Marie Hanna
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon.
| | - Ghadi El Hasbani
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| | - Danielle Azar
- Department of Computer Science and Mathematics, Lebanese American University, Byblos, Lebanon
| |
Collapse
|
2
|
Roy S, Sheikh SZ, Furey TS. CoVar: A generalizable machine learning approach to identify the coordinated regulators driving variational gene expression. PLoS Comput Biol 2024; 20:e1012016. [PMID: 38630807 PMCID: PMC11057768 DOI: 10.1371/journal.pcbi.1012016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/10/2023] [Revised: 04/29/2024] [Accepted: 03/22/2024] [Indexed: 04/19/2024] Open
Abstract
Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an ML-based framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. Unlike differentially expressed genes (DEGs) that capture changes in individual gene expression across conditions, CoVar focuses on identifying variational genes that undergo changes in their expression network interaction profiles, providing insights into changes in the regulatory dynamics, such as in disease pathogenesis. Subsequently, it finds core genes from among the nearest neighbors of these variational genes, which are central to the variational activity and influence the coordinated regulatory processes underlying the observed changes in gene expression. Through the analysis of simulated as well as yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar captures the intrinsic variationality and modularity in the expression data, identifying key driver genes not found through existing differential analysis methodologies.
Collapse
Affiliation(s)
- Satyaki Roy
- Department of Genetics, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Shehzad Z. Sheikh
- Departments of Medicine and Genetics, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, United States of America
| | - Terrence S. Furey
- Departments of Genetics and Biology, Center for Gastrointestinal Biology and Disease, University of North Carolina, Chapel Hill, North Carolina, United States of America
| |
Collapse
|
3
|
Roy S, Sheikh SZ, Furey TS. CoVar: A generalizable machine learning approach to identify the coordinated regulators driving variational gene expression. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2023:2023.01.12.523808. [PMID: 36712050 PMCID: PMC9882103 DOI: 10.1101/2023.01.12.523808] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/15/2023]
Abstract
Network inference is used to model transcriptional, signaling, and metabolic interactions among genes, proteins, and metabolites that identify biological pathways influencing disease pathogenesis. Advances in machine learning (ML)-based inference models exhibit the predictive capabilities of capturing latent patterns in genomic data. Such models are emerging as an alternative to the statistical models identifying causative factors driving complex diseases. We present CoVar, an inference framework that builds upon the properties of existing inference models, to find the central genes driving perturbed gene expression across biological states. We leverage ML-based network inference to find networks that capture the strength of regulatory interactions. Our model first pinpoints a subset of genes, termed variational, whose expression variabilities typify the differences in network connectivity between the control and perturbed data. Variational genes, by being differentially expressed themselves or possessing differentially expressed neighbor genes, capture gene expression variability. CoVar then creates subnetworks comprising variational genes and their strongly connected neighbor genes and identifies core genes central to these subnetworks that influence the bulk of the variational activity. Through the analysis of yeast expression data perturbed by the deletion of the mitochondrial genome, we show that CoVar identifies key genes not found through independent differential expression analysis.
Collapse
|
4
|
Le DH. A network-based method for predicting disease-associated enhancers. PLoS One 2021; 16:e0260432. [PMID: 34879086 PMCID: PMC8654176 DOI: 10.1371/journal.pone.0260432] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2021] [Accepted: 11/09/2021] [Indexed: 11/18/2022] Open
Abstract
Background Enhancers regulate transcription of target genes, causing a change in expression level. Thus, the aberrant activity of enhancers can lead to diseases. To date, a large number of enhancers have been identified, yet a small portion of them have been found to be associated with diseases. This raises a pressing need to develop computational methods to predict associations between diseases and enhancers. Results In this study, we assumed that enhancers sharing target genes could be associated with similar diseases to predict the association. Thus, we built an enhancer functional interaction network by connecting enhancers significantly sharing target genes, then developed a network diffusion method RWDisEnh, based on a random walk with restart algorithm, on networks of diseases and enhancers to globally measure the degree of the association between diseases and enhancers. RWDisEnh performed best when the disease similarities are integrated with the enhancer functional interaction network by known disease-enhancer associations in the form of a heterogeneous network of diseases and enhancers. It was also superior to another network diffusion method, i.e., PageRank with Priors, and a neighborhood-based one, i.e., MaxLink, which simply chooses the closest neighbors of known disease-associated enhancers. Finally, we showed that RWDisEnh could predict novel enhancers, which are either directly or indirectly associated with diseases. Conclusions Taken together, RWDisEnh could be a potential method for predicting disease-enhancer associations.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
- * E-mail:
| |
Collapse
|
5
|
Yang H, Zhuang Z, Pan W. A graph convolutional neural network for gene expression data analysis with multiple gene networks. Stat Med 2021; 40:5547-5564. [PMID: 34258781 DOI: 10.1002/sim.9140] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 04/07/2021] [Accepted: 06/21/2021] [Indexed: 02/01/2023]
Abstract
Spectral graph convolutional neural networks (GCN) are proposed to incorporate important information contained in graphs such as gene networks. In a standard spectral GCN, there is only one gene network to describe the relationships among genes. However, for genomic applications, due to condition- or tissue-specific gene function and regulation, multiple gene networks may be available; it is unclear how to apply GCNs to disease classification with multiple networks. Besides, which gene networks may provide more effective prior information for a given learning task is unknown a priori and is not straightforward to discover in many cases. A deep multiple graph convolutional neural network is therefore developed here to meet the challenge. The new approach not only computes a feature of a gene as the weighted average of those of itself and its neighbors through spectral GCNs, but also extracts features from gene-specific expression (or other feature) profiles via a feed-forward neural networks (FNN). We also provide two measures, the importance of a given gene and the relative importance score of each gene network, for the genes' and gene networks' contributions, respectively, to the learning task. To evaluate the new method, we conduct real data analyses using several breast cancer and diffuse large B-cell lymphoma datasets and incorporating multiple gene networks obtained from "GIANT 2.0" Compared with the standard FNN, GCN, and random forest, the new method not only yields high classification accuracy but also prioritizes the most important genes confirmed to be highly associated with cancer, strongly suggesting the usefulness of the new method in incorporating multiple gene networks.
Collapse
Affiliation(s)
- Hu Yang
- School of Information, Central University of Finance and Economics, Beijing, China
| | - Zhong Zhuang
- Department of EECE, University of Minnesota, Minneapolis, Minnesota, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, Minnesota, USA
| |
Collapse
|
6
|
Ou-Yang L, Cai D, Zhang XF, Yan H. WDNE: an integrative graphical model for inferring differential networks from multi-platform gene expression data with missing values. Brief Bioinform 2021; 22:6272792. [PMID: 33975339 DOI: 10.1093/bib/bbab086] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2021] [Revised: 02/14/2021] [Accepted: 02/23/2021] [Indexed: 11/14/2022] Open
Abstract
The mechanisms controlling biological process, such as the development of disease or cell differentiation, can be investigated by examining changes in the networks of gene dependencies between states in the process. High-throughput experimental methods, like microarray and RNA sequencing, have been widely used to gather gene expression data, which paves the way to infer gene dependencies based on computational methods. However, most differential network analysis methods are designed to deal with fully observed data, but missing values, such as the dropout events in single-cell RNA-sequencing data, are frequent. New methods are needed to take account of these missing values. Moreover, since the changes of gene dependencies may be driven by certain perturbed genes, considering the changes in gene expression levels may promote the identification of gene network rewiring. In this study, a novel weighted differential network estimation (WDNE) model is proposed to handle multi-platform gene expression data with missing values and take account of changes in gene expression levels. Simulation studies demonstrate that WDNE outperforms state-of-the-art differential network estimation methods. When applied WDNE to infer differential gene networks associated with drug resistance in ovarian tumors, cell differentiation and breast tumor heterogeneity, the hub genes in the estimated differential gene networks can provide important insights into the underlying mechanisms. Furthermore, a Matlab toolbox, differential network analysis toolbox, was developed to implement the WDNE model and visualize the estimated differential networks.
Collapse
Affiliation(s)
- Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing, Shenzhen Key Laboratory of Media Security, and Guangdong Laboratory of Artificial Intelligence and Digital Economy(SZ), College of Electronics and Information Engineering, Shenzhen University, Shenzhen, 518060, China
| | - Dehan Cai
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, 999077, China
| | - Xiao-Fei Zhang
- School of Mathematics and Statistics & Hubei Key Laboratory of Mathematical Sciences, Central China Normal University, Wuhan, 430079, China
| | - Hong Yan
- Department of Electrical Engineering, City University of Hong Kong, Hong Kong, 999077, China
| |
Collapse
|
7
|
Cho JW, Son J, Ha SJ, Lee I. Systems biology analysis identifies TNFRSF9 as a functional marker of tumor-infiltrating regulatory T-cell enabling clinical outcome prediction in lung cancer. Comput Struct Biotechnol J 2021; 19:860-868. [PMID: 33598101 PMCID: PMC7851794 DOI: 10.1016/j.csbj.2021.01.025] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/11/2020] [Revised: 01/17/2021] [Accepted: 01/18/2021] [Indexed: 12/21/2022] Open
Abstract
Regulatory T cells (Tregs) are enriched in the tumor microenvironment and play key roles in immune evasion of cancer cells. Cell surface markers specific for tumor-infiltrating Tregs (TI-Tregs) can be effectively targeted to enhance antitumor immunity and used for stratification of immunotherapy outcomes. Here, we present a systems biology approach to identify functional cell surface markers for TI-Tregs. We selected differentially expressed genes for surface proteins of TI-Tregs and compared these with other CD4+ T cells using bulk RNA-sequencing data from murine lung cancer models. Thereafter, we filtered for human orthologues with conserved expression in TI-Tregs using single-cell transcriptome data from patients with non-small cell lung cancer (NSCLC). To evaluate the functional importance of expression-based markers of TI-Tregs, we utilized network-based measure of context-associated centrality in a Treg-specific coregulatory network. We identified TNFRSF9 (also known as 4-1BB or CD137), a previously reported target for enhancing antitumor immunity, among the final candidates for TI-Treg markers with high functional importance score. We found that the low TNFRSF9 expression level in Tregs was associated with enhanced overall survival rate and response to anti-PD-1 immunotherapy in patients with NSCLC, proposing that TNFRSF9 promotes immune suppressive activity of Tregs in tumor. Collectively, these results demonstrated that integrative transcriptome and network analysis can facilitate the discovery of functional markers of tumor-specific immune cells to develop novel therapeutic targets and biomarkers for boosting cancer immunotherapy.
Collapse
Affiliation(s)
- Jae-Won Cho
- Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Jimin Son
- Department of Biochemistry, College of Life Science & Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Sang-Jun Ha
- Department of Biochemistry, College of Life Science & Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science & Biotechnology, Yonsei University, Seoul 03722, Republic of Korea
| |
Collapse
|
8
|
Barinotti A, Radin M, Cecchi I, Foddai SG, Rubini E, Roccatello D, Sciascia S, Menegatti E. Genetic Factors in Antiphospholipid Syndrome: Preliminary Experience with Whole Exome Sequencing. Int J Mol Sci 2020; 21:E9551. [PMID: 33333988 PMCID: PMC7765384 DOI: 10.3390/ijms21249551] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2020] [Revised: 12/11/2020] [Accepted: 12/13/2020] [Indexed: 12/18/2022] Open
Abstract
As in many autoimmune diseases, the pathogenesis of the antiphospholipid syndrome (APS) is the result of a complex interplay between predisposing genes and triggering environmental factors, leading to a loss of self-tolerance and immune-mediated tissue damage. While the first genetic studies in APS focused primarily on the human leukocytes antigen system (HLA) region, more recent data highlighted the role of other genes in APS susceptibility, including those involved in the immune response and in the hemostatic process. In order to join this intriguing debate, we analyzed the single-nucleotide polymorphisms (SNPs) derived from the whole exome sequencing (WES) of two siblings affected by APS and compared our findings with the available literature. We identified genes encoding proteins involved in the hemostatic process, the immune response, and the phospholipid metabolism (PLA2G6, HSPG2, BCL3, ZFAT, ATP2B2, CRTC3, and ADCY3) of potential interest when debating the pathogenesis of the syndrome. The study of the selected SNPs in a larger cohort of APS patients and the integration of WES results with the network-based approaches will help decipher the genetic risk factors involved in the diverse clinical features of APS.
Collapse
Affiliation(s)
- Alice Barinotti
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
- Department of Clinical and Biological Sciences, School of Specialization of Clinical Pathology, University of Turin, 10125 Turin, Italy
| | - Massimo Radin
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
| | - Irene Cecchi
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
| | - Silvia Grazietta Foddai
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
- Department of Clinical and Biological Sciences, School of Specialization of Clinical Pathology, University of Turin, 10125 Turin, Italy
| | - Elena Rubini
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
| | - Dario Roccatello
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
- Nephrology and Dialysis, Department of Clinical and Biological Sciences, S. Giovanni Bosco Hospital and University of Turin, 10154 Turin, Italy
| | - Savino Sciascia
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
- Nephrology and Dialysis, Department of Clinical and Biological Sciences, S. Giovanni Bosco Hospital and University of Turin, 10154 Turin, Italy
| | - Elisa Menegatti
- Center of Research of Immunopathology and Rare Diseases—Coordinating Center of Piemonte and Aosta Valley Network for Rare Diseases, S. Giovanni Bosco Hospital, Department of Clinical and Biological Sciences, University of Turin, 10154 Turin, Italy; (A.B.); (M.R.); (I.C.); (S.G.F.); (E.R.); (D.R.); (E.M.)
- Department of Clinical and Biological Sciences, School of Specialization of Clinical Pathology, University of Turin, 10125 Turin, Italy
| |
Collapse
|
9
|
García-Rodríguez R, Hiller M, Jiménez-Gracia L, van der Pal Z, Balog J, Adamzek K, Aartsma-Rus A, Spitali P. Premature termination codons in the DMD gene cause reduced local mRNA synthesis. Proc Natl Acad Sci U S A 2020; 117:16456-16464. [PMID: 32616572 PMCID: PMC7368324 DOI: 10.1073/pnas.1910456117] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open
Abstract
Duchenne muscular dystrophy (DMD) is caused by mutations in the DMD gene leading to the presence of premature termination codons (PTC). Previous transcriptional studies have shown reduced DMD transcript levels in DMD patient and animal model muscles when PTC are present. Nonsense-mediated decay (NMD) has been suggested to be responsible for the observed reduction, but there is no experimental evidence supporting this claim. In this study, we aimed to investigate the mechanism responsible for the drop in DMD expression levels in the presence of PTC. We observed that the inhibition of NMD does not normalize DMD gene expression in DMD. Additionally, in situ hybridization showed that DMD messenger RNA primarily localizes in the nuclear compartment, confirming that a cytoplasmic mechanism like NMD indeed cannot be responsible for the observed reduction. Sequencing of nascent RNA to explore DMD transcription dynamics revealed a lower rate of DMD transcription in patient-derived myotubes compared to healthy controls, suggesting a transcriptional mechanism involved in reduced DMD transcript levels. Chromatin immunoprecipitation in muscle showed increased levels of the repressive histone mark H3K9me3 in mdx mice compared to wild-type mice, indicating a chromatin conformation less prone to transcription in mdx mice. In line with this finding, treatment with the histone deacetylase inhibitor givinostat caused a significant increase in DMD transcript expression in mdx mice. Overall, our findings show that transcription dynamics across the DMD locus are affected by the presence of PTC, hinting at a possible epigenetic mechanism responsible for this process.
Collapse
Affiliation(s)
- Raquel García-Rodríguez
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Monika Hiller
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Laura Jiménez-Gracia
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Zarah van der Pal
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Judit Balog
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Kevin Adamzek
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Annemieke Aartsma-Rus
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| | - Pietro Spitali
- Department of Human Genetics, Leiden University Medical Center, 2333ZA Leiden, The Netherlands
| |
Collapse
|
10
|
Hwang S, Kim CY, Yang S, Kim E, Hart T, Marcotte EM, Lee I. HumanNet v2: human gene networks for disease research. Nucleic Acids Res 2020; 47:D573-D580. [PMID: 30418591 PMCID: PMC6323914 DOI: 10.1093/nar/gky1126] [Citation(s) in RCA: 114] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2018] [Accepted: 10/25/2018] [Indexed: 12/15/2022] Open
Abstract
Human gene networks have proven useful in many aspects of disease research, with numerous network-based strategies developed for generating hypotheses about gene-disease-drug associations. The ability to predict and organize genes most relevant to a specific disease has proven especially important. We previously developed a human functional gene network, HumanNet, by integrating diverse types of omics data using Bayesian statistics framework and demonstrated its ability to retrieve disease genes. Here, we present HumanNet v2 (http://www.inetbio.org/humannet), a database of human gene networks, which was updated by incorporating new data types, extending data sources and improving network inference algorithms. HumanNet now comprises a hierarchy of human gene networks, allowing for more flexible incorporation of network information into studies. HumanNet performs well in ranking disease-linked gene sets with minimal literature-dependent biases. We observe that incorporating model organisms’ protein–protein interactions does not markedly improve disease gene predictions, suggesting that many of the disease gene associations are now captured directly in human-derived datasets. With an improved interactive user interface for disease network analysis, we expect HumanNet will be a useful resource for network medicine.
Collapse
Affiliation(s)
- Sohyun Hwang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea.,Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Biomedical Science, College of Life Science, CHA University, Seongnam-si 13496, Korea
| | - Chan Yeong Kim
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Sunmo Yang
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| | - Eiru Kim
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Traver Hart
- Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX USA
| | - Edward M Marcotte
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, University of Texas, Austin, TX 78712, USA.,Department of Molecular Biosciences, University of Texas at Austin, TX 78712, USA
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722, Korea
| |
Collapse
|
11
|
Chang HC, Chu CP, Lin SJ, Hsiao CK. Network hub-node prioritization of gene regulation with intra-network association. BMC Bioinformatics 2020; 21:101. [PMID: 32164570 PMCID: PMC7069025 DOI: 10.1186/s12859-020-3444-7] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Accepted: 03/06/2020] [Indexed: 11/10/2022] Open
Abstract
Background To identify and prioritize the influential hub genes in a gene-set or biological pathway, most analyses rely on calculation of marginal effects or tests of statistical significance. These procedures may be inappropriate since hub nodes are common connection points and therefore may interact with other nodes more often than non-hub nodes do. Such dependence among gene nodes can be conjectured based on the topology of the pathway network or the correlation between them. Results Here we develop a pathway activity score incorporating the marginal (local) effects of gene nodes as well as intra-network affinity measures. This score summarizes the expression levels in a gene-set/pathway for each sample, with weights on local and network information, respectively. The score is next used to examine the impact of each node through a leave-one-out evaluation. To illustrate the procedure, two cancer studies, one involving RNA-Seq from breast cancer patients with high-grade ductal carcinoma in situ and one microarray expression data from ovarian cancer patients, are used to assess the performance of the procedure, and to compare with existing methods, both ones that do and do not take into consideration correlation and network information. The hub nodes identified by the proposed procedure in the two cancer studies are known influential genes; some have been included in standard treatments and some are currently considered in clinical trials for target therapy. The results from simulation studies show that when marginal effects are mild or weak, the proposed procedure can still identify causal nodes, whereas methods relying only on marginal effect size cannot. Conclusions The NetworkHub procedure proposed in this research can effectively utilize the network information in combination with local effects derived from marker values, and provide a useful and complementary list of recommendations for prioritizing causal hubs.
Collapse
Affiliation(s)
- Hung-Ching Chang
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Chiao-Pei Chu
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan
| | - Shu-Ju Lin
- Institute of Statistical Science, Academia Sinica, Taipei, 11529, Taiwan
| | - Chuhsing Kate Hsiao
- Division of Biostatistics, Institute of Epidemiology and Preventive Medicine, National Taiwan University, No. 17, Xu-Zhou Road, Taipei, 10055, Taiwan. .,Bioinformatics and Biostatistics Core, Center of Genomic Medicine, National Taiwan University, Taipei, 10055, Taiwan.
| |
Collapse
|
12
|
Xu W, Li S, Zhang Z, Hu J, Zhao Y. Prioritization of differentially expressed genes through integrating public expression data. Anim Genet 2019; 50:726-732. [PMID: 31512747 DOI: 10.1111/age.12855] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Accepted: 08/06/2019] [Indexed: 11/29/2022]
Abstract
Differentially expressed gene (DEG) analysis is a major approach for interpreting phenotype differences and produces a large number of candidate genes. Given that it is burdensome to validate too many genes through benchwork, an urgent need exists for DEG prioritization. Here, a novel method is proposed for prioritizing bona fide DEGs by constructing the normal range of gene expression through integrating public expression data. Prioritization was performed by ranking the differences in cumulative probability for genes in case and control groups. DEGs from a study on pig muscle tissue were used to evaluate the prioritization accuracy. The results showed that the method reached an area under the receiver operating characteristic curve of 96.42% and can effectively shorten the list of candidate genes from a differential expression experiment to find novel causal genes. Our method can be easily extended to other tissues or species to promote functional research in broad applications.
Collapse
Affiliation(s)
- W Xu
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.,State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - S Li
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.,State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - Z Zhang
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - J Hu
- State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| | - Y Zhao
- Beijing Advanced Innovation Center for Food Nutrition and Human Health, College of Biological Sciences, China Agricultural University, Beijing, 100193, China.,State Key Laboratory of Agrobiotechnology, China Agricultural University, Beijing, 100193, China
| |
Collapse
|
13
|
Han H, Lee S, Lee I. NGSEA: Network-Based Gene Set Enrichment Analysis for Interpreting Gene Expression Phenotypes with Functional Gene Sets. Mol Cells 2019; 42:579-588. [PMID: 31307154 PMCID: PMC6715341 DOI: 10.14348/molcells.2019.0065] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2019] [Revised: 06/28/2019] [Accepted: 06/30/2019] [Indexed: 11/27/2022] Open
Abstract
Gene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets; however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.
Collapse
Affiliation(s)
- Heonjong Han
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722,
Korea
| | - Sangyoung Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722,
Korea
| | - Insuk Lee
- Department of Biotechnology, College of Life Science and Biotechnology, Yonsei University, Seoul 03722,
Korea
- Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Seoul 03722,
Korea
| |
Collapse
|
14
|
Dozmorov MG. Disease classification: from phenotypic similarity to integrative genomics and beyond. Brief Bioinform 2019; 20:1769-1780. [DOI: 10.1093/bib/bby049] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Revised: 05/01/2018] [Indexed: 02/06/2023] Open
Abstract
Abstract
A fundamental challenge of modern biomedical research is understanding how diseases that are similar on the phenotypic level are similar on the molecular level. Integration of various genomic data sets with the traditionally used phenotypic disease similarity revealed novel genetic and molecular mechanisms and blurred the distinction between monogenic (Mendelian) and complex diseases. Network-based medicine has emerged as a complementary approach for identifying disease-causing genes, genetic mediators, disruptions in the underlying cellular functions and for drug repositioning. The recent development of machine and deep learning methods allow for leveraging real-life information about diseases to refine genetic and phenotypic disease relationships. This review describes the historical development and recent methodological advancements for studying disease classification (nosology).
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, 830 East Main Street, Richmond, VA, USA
| |
Collapse
|
15
|
Shah SD, Braun R. GeneSurrounder: network-based identification of disease genes in expression data. BMC Bioinformatics 2019; 20:229. [PMID: 31060502 PMCID: PMC6503437 DOI: 10.1186/s12859-019-2829-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 04/17/2019] [Indexed: 11/24/2022] Open
Abstract
Background A key challenge of identifying disease–associated genes is analyzing transcriptomic data in the context of regulatory networks that control cellular processes in order to capture multi-gene interactions and yield mechanistically interpretable results. One existing category of analysis techniques identifies groups of related genes using interaction networks, but these gene sets often comprise tens or hundreds of genes, making experimental follow-up challenging. A more recent category of methods identifies precise gene targets while incorporating systems-level information, but these techniques do not determine whether a gene is a driving source of changes in its network, an important characteristic when looking for potential drug targets. Results We introduce GeneSurrounder, an analysis method that integrates expression data and network information in a novel procedure to detect genes that are sources of dysregulation on the network. The key idea of our method is to score genes based on the evidence that they influence the dysregulation of their neighbors on the network in a manner that impacts cell function. Applying GeneSurrounder to real expression data, we show that our method is able to identify biologically relevant genes, integrate pathway and expression data, and yield more reproducible results across multiple studies of the same phenotype than competing methods. Conclusions Together these findings suggest that GeneSurrounder provides a new avenue for identifying individual genes that can be targeted therapeutically. The key innovation of GeneSurrounder is the combination of pathway network information with gene expression data to determine the degree to which a gene is a source of dysregulation on the network. By prioritizing genes in this way, our method provides insights into disease mechanisms and suggests diagnostic and therapeutic targets. Our method can be used to help biologists select among tens or hundreds of genes for further validation. The implementation in R is available at github.com/sahildshah1/gene-surrounder. Electronic supplementary material The online version of this article (10.1186/s12859-019-2829-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Sahil D Shah
- Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, USA
| | - Rosemary Braun
- Engineering Sciences and Applied Mathematics, Northwestern University, Evanston, USA. .,Biostatistics, Feinberg School of Medicine, Chicago, USA. .,Northwestern Institute on Complex Systems, Northwestern University, Evanston, USA.
| |
Collapse
|
16
|
Systems biology approach identifies key regulators and the interplay between miRNAs and transcription factors for pathological cardiac hypertrophy. Gene 2019; 698:157-169. [DOI: 10.1016/j.gene.2019.02.056] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2018] [Revised: 01/31/2019] [Accepted: 02/20/2019] [Indexed: 12/16/2022]
|
17
|
Jalili M, Gebhardt T, Wolkenhauer O, Salehzadeh-Yazdi A. Unveiling network-based functional features through integration of gene expression into protein networks. Biochim Biophys Acta Mol Basis Dis 2018; 1864:2349-2359. [PMID: 29466699 DOI: 10.1016/j.bbadis.2018.02.010] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2017] [Revised: 01/31/2018] [Accepted: 02/13/2018] [Indexed: 02/02/2023]
Abstract
Decoding health and disease phenotypes is one of the fundamental objectives in biomedicine. Whereas high-throughput omics approaches are available, it is evident that any single omics approach might not be adequate to capture the complexity of phenotypes. Therefore, integrated multi-omics approaches have been used to unravel genotype-phenotype relationships such as global regulatory mechanisms and complex metabolic networks in different eukaryotic organisms. Some of the progress and challenges associated with integrated omics studies have been reviewed previously in comprehensive studies. In this work, we highlight and review the progress, challenges and advantages associated with emerging approaches, integrating gene expression and protein-protein interaction networks to unravel network-based functional features. This includes identifying disease related genes, gene prioritization, clustering protein interactions, developing the modules, extract active subnetworks and static protein complexes or dynamic/temporal protein complexes. We also discuss how these approaches contribute to our understanding of the biology of complex traits and diseases. This article is part of a Special Issue entitled: Cardiac adaptations to obesity, diabetes and insulin resistance, edited by Professors Jan F.C. Glatz, Jason R.B. Dyck and Christine Des Rosiers.
Collapse
Affiliation(s)
- Mahdi Jalili
- Hematology, Oncology and SCT Research Center, Tehran University of Medical Sciences, Tehran, Iran; Hematologic Malignancies Research Center, Tehran University of Medical Sciences, Tehran, Iran
| | - Tom Gebhardt
- Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany
| | - Olaf Wolkenhauer
- Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany
| | - Ali Salehzadeh-Yazdi
- Department of Systems Biology and Bioinformatics, University of Rostock, 18051 Rostock, Germany.
| |
Collapse
|
18
|
Rough Hypercuboid and Modified Kulczynski Coefficient for Disease Gene Identification. ACTA ACUST UNITED AC 2017. [DOI: 10.1007/978-3-319-54430-4_45] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/04/2023]
|
19
|
Differential Regulatory Analysis Based on Coexpression Network in Cancer Research. BIOMED RESEARCH INTERNATIONAL 2016; 2016:4241293. [PMID: 27597964 PMCID: PMC4997028 DOI: 10.1155/2016/4241293] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/14/2016] [Revised: 06/09/2016] [Accepted: 06/12/2016] [Indexed: 12/15/2022]
Abstract
With rapid development of high-throughput techniques and accumulation of big transcriptomic data, plenty of computational methods and algorithms such as differential analysis and network analysis have been proposed to explore genome-wide gene expression characteristics. These efforts are aiming to transform underlying genomic information into valuable knowledges in biological and medical research fields. Recently, tremendous integrative research methods are dedicated to interpret the development and progress of neoplastic diseases, whereas differential regulatory analysis (DRA) based on gene coexpression network (GCN) increasingly plays a robust complement to regular differential expression analysis in revealing regulatory functions of cancer related genes such as evading growth suppressors and resisting cell death. Differential regulatory analysis based on GCN is prospective and shows its essential role in discovering the system properties of carcinogenesis features. Here we briefly review the paradigm of differential regulatory analysis based on GCN. We also focus on the applications of differential regulatory analysis based on GCN in cancer research and point out that DRA is necessary and extraordinary to reveal underlying molecular mechanism in large-scale carcinogenesis studies.
Collapse
|
20
|
|
21
|
Zhang XM, Guo L, Chi MH, Sun HM, Chen XW. Identification of active miRNA and transcription factor regulatory pathways in human obesity-related inflammation. BMC Bioinformatics 2015; 16:76. [PMID: 25887648 PMCID: PMC4355475 DOI: 10.1186/s12859-015-0512-5] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/11/2014] [Accepted: 02/24/2015] [Indexed: 12/21/2022] Open
Abstract
Background Obesity-induced chronic inflammation plays a fundamental role in the pathogenesis of metabolic syndrome (MS). Recently, a growing body of evidence supports that miRNAs are largely dysregulated in obesity and that specific miRNAs regulate obesity-associated inflammation. We applied an approach aiming to identify active miRNA-TF-gene regulatory pathways in obesity. Firstly, we detected differentially expressed genes (DEGs) and differentially expressed miRNAs (DEmiRs) from mRNA and miRNA expression profiles, respectively. Secondly, by mapping the DEGs and DEmiRs to the curated miRNA-TF-gene regulatory network as active seed nodes and connect them with their immediate neighbors, we obtained the potential active miRNA-TF-gene regulatory subnetwork in obesity. Thirdly, using a Breadth-First-Search (BFS) algorithm, we identified potential active miRNA-TF-gene regulatory pathways in obesity. Finally, through the hypergeometric test, we identified the active miRNA-TF-gene regulatory pathways that were significantly related to obesity. Results The potential active pathways with FDR < 0.0005 were considered to be the active miRNA-TF regulatory pathways in obesity. The union of the active pathways is visualized and identical nodes of the active pathways were merged. Conclusions We identified 23 active miRNA-TF-gene regulatory pathways that were significantly related to obesity-related inflammation. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0512-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xi-Mei Zhang
- Department of Histology and Embryology, Harbin Medical University, Harbin, 150081, PR China.
| | - Lin Guo
- Department of Endocrinology and Metabolism, the Second Affiliated Hospital of Harbin Medical University, Harbin, 150081, PR China.
| | - Mei-Hua Chi
- Teaching Experiment Center of Morphology, Harbin Medical University, Harbin, 150081, PR China.
| | - Hong-Mei Sun
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, PR China.
| | - Xiao-Wen Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150081, PR China.
| |
Collapse
|
22
|
The network organization of cancer-associated protein complexes in human tissues. Sci Rep 2014; 3:1583. [PMID: 23567845 PMCID: PMC3620901 DOI: 10.1038/srep01583] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2012] [Accepted: 03/07/2013] [Indexed: 12/24/2022] Open
Abstract
Differential gene expression profiles for detecting disease genes have been studied intensively in systems biology. However, it is known that various biological functions achieved by proteins follow from the ability of the protein to form complexes by physically binding to each other. In other words, the functional units are often protein complexes rather than individual proteins. Thus, we seek to replace the perspective of disease-related genes by disease-related complexes, exemplifying with data on 39 human solid tissue cancers and their original normal tissues. To obtain the differential abundance levels of protein complexes, we apply an optimization algorithm to genome-wide differential expression data. From the differential abundance of complexes, we extract tissue- and cancer-selective complexes, and investigate their relevance to cancer. The method is supported by a clustering tendency of bipartite cancer-complex relationships, as well as a more concrete and realistic approach to disease-related proteomics.
Collapse
|
23
|
Wang Y, Fang H, Yang T, Wu D, Zhao J. Degree‐adjusted algorithm for prioritisation of candidate disease genes from gene expression and protein interactome. IET Syst Biol 2014; 8:41-6. [DOI: 10.1049/iet-syb.2013.0038] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open
Affiliation(s)
- Yichuan Wang
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Haiyang Fang
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Tinghong Yang
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Duzhi Wu
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| | - Jing Zhao
- Department of MathematicsLogistical Engineering UniversityChongqingPeople's Republic of China
| |
Collapse
|
24
|
Staunton L, Clancy T, Tonry C, Hernández B, Ademowo S, Dharsee M, Evans K, Parnell AC, Watson RW, Tasken KA, Pennington SR. Protein Quantification by MRM for Biomarker Validation. QUANTITATIVE PROTEOMICS 2014. [DOI: 10.1039/9781782626985-00277] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/30/2023]
Abstract
In this chapter we describe how mass spectrometry-based quantitative protein measurements by multiple reaction monitoring (MRM) have opened up the opportunity for the assembly of large panels of candidate protein biomarkers that can be simultaneously validated in large clinical cohorts to identify diagnostic protein biomarker signatures. We outline a workflow in which candidate protein biomarker panels are initially assembled from multiple diverse sources of discovery data, including proteomics and transcriptomics experiments, as well as from candidates found in the literature. Subsequently, the individual candidates in these large panels may be prioritised by application of a range of bioinformatics tools to generate a refined panel for which MRM assays may be developed. We describe a process for MRM assay design and implementation, and illustrate how the data generated from these multiplexed MRM measurements of prioritised candidates may be subjected to a range of statistical tools to create robust biomarker signatures for further clinical validation in large patient sample cohorts. Through this overall approach MRM has the potential to not only support individual biomarker validation but also facilitate the development of clinically useful protein biomarker signatures.
Collapse
Affiliation(s)
- L. Staunton
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - T. Clancy
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital Norway
| | - C. Tonry
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - B. Hernández
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - S. Ademowo
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - M. Dharsee
- Ontario Cancer Biomarker Network Toronto Ontario M5A 2K3 Canada
| | - K. Evans
- Ontario Cancer Biomarker Network Toronto Ontario M5A 2K3 Canada
| | - A. C. Parnell
- School of Mathematical Sciences, University College Dublin Dublin 4 Ireland
| | - R. W. Watson
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| | - K. A. Tasken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital Norway
| | - S. R. Pennington
- UCD Conway Institute, School of Medicine and Medical Science, University College Dublin Dublin 4 Ireland
| |
Collapse
|
25
|
Jia P, Zhao Z. Network.assisted analysis to prioritize GWAS results: principles, methods and perspectives. Hum Genet 2014; 133:125-38. [PMID: 24122152 PMCID: PMC3943795 DOI: 10.1007/s00439-013-1377-1] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2012] [Accepted: 10/03/2013] [Indexed: 01/24/2023]
Abstract
Genome-wide association studies (GWAS) have rapidly become a powerful tool in genetic studies of complex diseases and traits. Traditionally, single marker-based tests have been used prevalently in GWAS and have uncovered tens of thousands of disease-associated SNPs. Network-assisted analysis (NAA) of GWAS data is an emerging area in which network-related approaches are developed and utilized to perform advanced analyses of GWAS data in order to study various human diseases or traits. Progress has been made in both methodology development and applications of NAA in GWAS data, and it has already been demonstrated that NAA results may enhance our interpretation and prioritization of candidate genes and markers. Inspired by the strong interest in and high demand for advanced GWAS data analysis, in this review article, we discuss the methodologies and strategies that have been reported for the NAA of GWAS data. Many NAA approaches search for subnetworks and assess the combined effects of multiple genes participating in the resultant subnetworks through a gene set analysis. With no restriction to pre-defined canonical pathways, NAA has the advantage of defining subnetworks with the guidance of the GWAS data under investigation. In addition, some NAA methods prioritize genes from GWAS data based on their interconnections in the reference network. Here, we summarize NAA applications to various diseases and discuss the available options and potential caveats related to their practical usage. Additionally, we provide perspectives regarding this rapidly growing research area.
Collapse
|
26
|
Kayano M, Shiga M, Mamitsuka H. Detecting Differentially Coexpressed Genes from Labeled Expression Data: A Brief Review. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014; 11:154-167. [PMID: 26355515 DOI: 10.1109/tcbb.2013.2297921] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
We review methods for capturing differential coexpression, which can be divided into two cases by the size of gene sets: 1) two paired genes and 2) multiple genes. In the first case, two genes are positively and negatively correlated with each other under one and the other conditions, respectively. In the second case, multiple genes are coexpressed and randomly expressed under one and the other conditions, respectively. We summarize a variety of methods for the first and second cases into four and three approaches, respectively. We describe each of these approaches in detail technically, being followed by thorough comparative experiments with both synthetic and real data sets. Our experimental results imply high possibility of improving the efficiency of the current methods, particularly in the case of multiple genes, because of low performance achieved by the best methods which are relatively simple intuitive ones.
Collapse
|
27
|
Abstract
MOTIVATION Several types of studies, including genome-wide association studies and RNA interference screens, strive to link genes to diseases. Although these approaches have had some success, genetic variants are often only present in a small subset of the population, and screens are noisy with low overlap between experiments in different labs. Neither provides a mechanistic model explaining how identified genes impact the disease of interest or the dynamics of the pathways those genes regulate. Such mechanistic models could be used to accurately predict downstream effects of knocking down pathway members and allow comprehensive exploration of the effects of targeting pairs or higher-order combinations of genes. RESULTS We developed methods to model the activation of signaling and dynamic regulatory networks involved in disease progression. Our model, SDREM, integrates static and time series data to link proteins and the pathways they regulate in these networks. SDREM uses prior information about proteins' likelihood of involvement in a disease (e.g. from screens) to improve the quality of the predicted signaling pathways. We used our algorithms to study the human immune response to H1N1 influenza infection. The resulting networks correctly identified many of the known pathways and transcriptional regulators of this disease. Furthermore, they accurately predict RNA interference effects and can be used to infer genetic interactions, greatly improving over other methods suggested for this task. Applying our method to the more pathogenic H5N1 influenza allowed us to identify several strain-specific targets of this infection. AVAILABILITY SDREM is available from http://sb.cs.cmu.edu/sdrem. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Anthony Gitter
- Computer Science Department and Lane Center for Computational Biology, School of Computer Science, Carnegie Mellon University, 5000 Forbes Avenue, Pittsburgh, PA 15213, USA
| | | |
Collapse
|
28
|
Bao S, Zhou X, Zhang L, Zhou J, To KKW, Wang B, Wang L, Zhang X, Song YQ. Prioritizing genes responsible for host resistance to influenza using network approaches. BMC Genomics 2013; 14:816. [PMID: 24261899 PMCID: PMC4046670 DOI: 10.1186/1471-2164-14-816] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2013] [Accepted: 11/06/2013] [Indexed: 01/17/2023] Open
Abstract
Background The genetic make-up of humans and other mammals (such as mice) affects their resistance to influenza virus infection. Considering the complexity and moral issues associated with experiments on human subjects, we have only acquired partial knowledge regarding the underlying molecular mechanisms. Although influenza resistance in inbred mice has been mapped to several quantitative trait loci (QTLs), which have greatly narrowed down the search for host resistance genes, only few underlying genes have been identified. Results To prioritize a list of promising candidates for future functional investigation, we applied network-based approaches to leverage the information of known resistance genes and the expression profiles contrasting susceptible and resistant mouse strains. The significance of top-ranked genes was supported by different lines of evidence from independent genetic associations, QTL studies, RNA interference (RNAi) screenings, and gene expression analysis. Further data mining on the prioritized genes revealed the functions of two pathways mediated by tumor necrosis factor (TNF): apoptosis and TNF receptor-2 signaling pathways. We suggested that the delicate balance between TNF’s pro-survival and apoptotic effects may affect hosts’ conditions after influenza virus infection. Conclusions This study considerably cuts down the list of candidate genes responsible for host resistance to influenza and proposed novel pathways and mechanisms. Our study also demonstrated the efficacy of network-based methods in prioritizing genes for complex traits. Electronic supplementary material The online version of this article (doi:10.1186/1471-2164-14-816) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
| | | | | | | | | | | | | | | | - You-Qiang Song
- Department of Biochemistry, The University of Hong Kong, Hong Kong, China.
| |
Collapse
|
29
|
Kimmel C, Visweswaran S. An algorithm for network-based gene prioritization that encodes knowledge both in nodes and in links. PLoS One 2013; 8:e79564. [PMID: 24260251 PMCID: PMC3834271 DOI: 10.1371/journal.pone.0079564] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2012] [Accepted: 09/25/2013] [Indexed: 11/19/2022] Open
Abstract
BACKGROUND Candidate gene prioritization aims to identify promising new genes associated with a disease or a biological process from a larger set of candidate genes. In recent years, network-based methods - which utilize a knowledge network derived from biological knowledge - have been utilized for gene prioritization. Biological knowledge can be encoded either through the network's links or nodes. Current network-based methods can only encode knowledge through links. This paper describes a new network-based method that can encode knowledge in links as well as in nodes. RESULTS We developed a new network inference algorithm called the Knowledge Network Gene Prioritization (KNGP) algorithm which can incorporate both link and node knowledge. The performance of the KNGP algorithm was evaluated on both synthetic networks and on networks incorporating biological knowledge. The results showed that the combination of link knowledge and node knowledge provided a significant benefit across 19 experimental diseases over using link knowledge alone or node knowledge alone. CONCLUSIONS The KNGP algorithm provides an advance over current network-based algorithms, because the algorithm can encode both link and node knowledge. We hope the algorithm will aid researchers with gene prioritization.
Collapse
Affiliation(s)
- Chad Kimmel
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
- * E-mail:
| | - Shyam Visweswaran
- Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, Pennsylvania, United States of America
| |
Collapse
|
30
|
Jiang W, Zhang Y, Meng F, Lian B, Chen X, Yu X, Dai E, Wang S, Liu X, Li X, Wang L, Li X. Identification of active transcription factor and miRNA regulatory pathways in Alzheimer’s disease. Bioinformatics 2013; 29:2596-602. [DOI: 10.1093/bioinformatics/btt423] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
|
31
|
Li W, Chen L, He W, Li W, Qu X, Liang B, Gao Q, Feng C, Jia X, Lv Y, Zhang S, Li X. Prioritizing disease candidate proteins in cardiomyopathy-specific protein-protein interaction networks based on "guilt by association" analysis. PLoS One 2013; 8:e71191. [PMID: 23940716 PMCID: PMC3733802 DOI: 10.1371/journal.pone.0071191] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2013] [Accepted: 06/28/2013] [Indexed: 01/12/2023] Open
Abstract
The cardiomyopathies are a group of heart muscle diseases which can be inherited (familial). Identifying potential disease-related proteins is important to understand mechanisms of cardiomyopathies. Experimental identification of cardiomyophthies is costly and labour-intensive. In contrast, bioinformatics approach has a competitive advantage over experimental method. Based on “guilt by association” analysis, we prioritized candidate proteins involving in human cardiomyopathies. We first built weighted human cardiomyopathy-specific protein-protein interaction networks for three subtypes of cardiomyopathies using the known disease proteins from Online Mendelian Inheritance in Man as seeds. We then developed a method in prioritizing disease candidate proteins to rank candidate proteins in the network based on “guilt by association” analysis. It was found that most candidate proteins with high scores shared disease-related pathways with disease seed proteins. These top ranked candidate proteins were related with the corresponding disease subtypes, and were potential disease-related proteins. Cross-validation and comparison with other methods indicated that our approach could be used for the identification of potentially novel disease proteins, which may provide insights into cardiomyopathy-related mechanisms in a more comprehensive and integrated way.
Collapse
Affiliation(s)
- Wan Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Lina Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| | - Weiming He
- Institute of Opto-electronics, Harbin Institute of Technology, Harbin, Heilongjiang Province, China
| | - Weiguo Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xiaoli Qu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Binhua Liang
- National Microbiology Laboratory, Public Health Agency of Canada, Winnipeg, Manitoba, Canada
| | - Qianping Gao
- Department of Cardiology, The First Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Chenchen Feng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xu Jia
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Yana Lv
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Siya Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
| | - Xia Li
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang Province, China
- * E-mail: (LC); (XL)
| |
Collapse
|
32
|
Wang PI, Hwang S, Kincaid RP, Sullivan CS, Lee I, Marcotte EM. RIDDLE: reflective diffusion and local extension reveal functional associations for unannotated gene sets via proximity in a gene network. Genome Biol 2012; 13:R125. [PMID: 23268829 PMCID: PMC4056375 DOI: 10.1186/gb-2012-13-12-r125] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/27/2012] [Accepted: 12/26/2012] [Indexed: 01/08/2023] Open
Abstract
The growing availability of large-scale functional networks has promoted the development of many successful techniques for predicting functions of genes. Here we extend these network-based principles and techniques to functionally characterize whole sets of genes. We present RIDDLE (Reflective Diffusion and Local Extension), which uses well developed guilt-by-association principles upon a human gene network to identify associations of gene sets. RIDDLE is particularly adept at characterizing sets with no annotations, a major challenge where most traditional set analyses fail. Notably, RIDDLE found microRNA-450a to be strongly implicated in ocular diseases and development. A web application is available at http://www.functionalnet.org/RIDDLE.
Collapse
|
33
|
Lavi O, Dror G, Shamir R. Network-induced classification kernels for gene expression profile analysis. J Comput Biol 2012; 19:694-709. [PMID: 22697242 DOI: 10.1089/cmb.2012.0065] [Citation(s) in RCA: 28] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open
Abstract
Computational classification of gene expression profiles into distinct disease phenotypes has been highly successful to date. Still, robustness, accuracy, and biological interpretation of the results have been limited, and it was suggested that use of protein interaction information jointly with the expression profiles can improve the results. Here, we study three aspects of this problem. First, we show that interactions are indeed relevant by showing that co-expressed genes tend to be closer in the network of interactions. Second, we show that the improved performance of one extant method utilizing expression and interactions is not really due to the biological information in the network, while in another method this is not the case. Finally, we develop a new kernel method--called NICK--that integrates network and expression data for SVM classification, and demonstrate that overall it achieves better results than extant methods while running two orders of magnitude faster.
Collapse
Affiliation(s)
- Ofer Lavi
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | |
Collapse
|
34
|
Antunes-Martins A, Perkins JR, Lees J, Hildebrandt T, Orengo C, Bennett DLH. Systems biology approaches to finding novel pain mediators. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 5:11-35. [PMID: 23059966 DOI: 10.1002/wsbm.1192] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/18/2022]
Abstract
Chronic pain represents a major health burden; this maladaptive pain state occurs as a consequence of hypersensitivity within the peripheral and central components of the somatosensory system. High throughput technologies (genomics, transciptomics, lipidomics, and proteomics) are now being applied to tissue derived from pain patients as well as experimental pain models to discover novel pain mediators. The use of clustering, meta-analysis and other techniques can help refine potential candidates. Of particular importance are systems biology methods, such as co-expression network generating algorithms, which infer potential associations/interactions between molecules and build networks based on these interactions. Protein-protein interaction networks allow the lists of potential targets generated by these different platforms to be analyzed in their biological context. Outputs from these different methods must also be related to the clinical pain phenotype. The improved and standardized phenotyping of pain symptoms and sensory signs enables much better subject stratification. Our hope is that, in the future, the use of computational approaches to integrate datasets including sensory phenotype as well as the outputs of high throughput technologies will help define novel pain mediators and provide insights into the pathogenesis of chronic pain.
Collapse
Affiliation(s)
- Ana Antunes-Martins
- The Wolfson Centre for Age-Related Diseases, King's College London, Guy's Campus, London, UK
| | | | | | | | | | | |
Collapse
|
35
|
Gao S, Jia S, Hessner MJ, Wang X. Predicting disease-related subnetworks for type 1 diabetes using a new network activity score. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY 2012; 16:566-78. [PMID: 22917479 DOI: 10.1089/omi.2012.0029] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]
Abstract
In this study we investigated the advantage of including network information in prioritizing disease genes of type 1 diabetes (T1D). First, a naïve Bayesian network (NBN) model was developed to integrate information from multiple data sources and to define a T1D-involvement probability score (PS) for each individual gene. The algorithm was validated using known functional candidate genes as a benchmark. Genes with higher PS were found to be more likely to appear in T1D-related publications. Next a new network activity metric was proposed to evaluate the T1D relevance of protein-protein interaction (PPI) subnetworks. The metric considered the contribution both from individual genes and from network topological characteristics. The predictions were confirmed by several independent datasets, including a genome wide association study (GWAS), and two large-scale human gene expression studies. We found that novel candidate genes in the T1D subnetworks showed more significant associations with T1D than genes predicted using PS alone. Interestingly, most novel candidates were not encoded within the human leukocyte antigen (HLA) region, and their expression levels showed correlation with disease only in cohorts with low-risk HLA genotypes. The results suggested the importance of mapping disease gene networks in dissecting the genetics of complex diseases, and offered a general approach to network-based disease gene prioritization from multiple data sources.
Collapse
Affiliation(s)
- Shouguo Gao
- Department of Physics, the University of Alabama at Birmingham, Birmingham, Alabama 35294, USA
| | | | | | | |
Collapse
|
36
|
Wu C, Zhu J, Zhang X. Integrating gene expression and protein-protein interaction network to prioritize cancer-associated genes. BMC Bioinformatics 2012; 13:182. [PMID: 22838965 PMCID: PMC3464615 DOI: 10.1186/1471-2105-13-182] [Citation(s) in RCA: 91] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/07/2012] [Accepted: 07/17/2012] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND To understand the roles they play in complex diseases, genes need to be investigated in the networks they are involved in. Integration of gene expression and network data is a promising approach to prioritize disease-associated genes. Some methods have been developed in this field, but the problem is still far from being solved. RESULTS In this paper, we developed a method, Networked Gene Prioritizer (NGP), to prioritize cancer-associated genes. Applications on several breast cancer and lung cancer datasets demonstrated that NGP performs better than the existing methods. It provides stable top ranking genes between independent datasets. The top-ranked genes by NGP are enriched in the cancer-associated pathways. The top-ranked genes by NGP-PLK1, MCM2, MCM3, MCM7, MCM10 and SKP2 might coordinate to promote cell cycle related processes in cancer but not normal cells. CONCLUSIONS In this paper, we have developed a method named NGP, to prioritize cancer-associated genes. Our results demonstrated that NGP performs better than the existing methods.
Collapse
Affiliation(s)
- Chao Wu
- MOE Key Laboratory of Bioinformatics and Bioinformatics Division, TNLIST and Department of Automation, Tsinghua University, Beijing 100084, PR China.
| | | | | |
Collapse
|
37
|
Zhao J, Chen J, Yang TH, Holme P. Insights into the pathogenesis of axial spondyloarthropathy from network and pathway analysis. BMC SYSTEMS BIOLOGY 2012; 6 Suppl 1:S4. [PMID: 23046677 PMCID: PMC3403611 DOI: 10.1186/1752-0509-6-s1-s4] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Abstract
Background Complex chronic diseases are usually not caused by changes in a single causal gene but by an unbalanced regulating network resulting from the dysfunctions of multiple genes or their products. Therefore, network based systems approach can be helpful for the identification of candidate genes related to complex diseases and their relationships. Axial spondyloarthropathy (SpA) is a group of chronic inflammatory joint diseases that mainly affect the spine and the sacroiliac joints. The pathogenesis of SpA remains largely unknown. Results In this paper, we conducted a network study of the pathogenesis of SpA. We integrated data related to SpA, from the OMIM database, proteomics and microarray experiments of SpA, to prioritize SpA candidate disease genes in the context of human protein interactome. Based on the top ranked SpA related genes, we constructed a SpA specific PPI network, identified potential pathways associated with SpA, and finally sketched an overview of biological processes involved in the development of SpA. Conclusions The protein-protein interaction (PPI) network and pathways reflect the link between the two pathological processes of SpA, i.e., immune mediated inflammation, as well as imbalanced bone modelling caused new boneformation and bone loss. We found that some known disease causative genes, such as TNFand ILs, play pivotal roles in this interaction.
Collapse
Affiliation(s)
- Jing Zhao
- Department of Mathematics, Logistical Engineering University, Chongqing, China.
| | | | | | | |
Collapse
|
38
|
Li W, Wang R, Bai L, Yan Z, Sun Z. Cancer core modules identification through genomic and transcriptomic changes correlation detection at network level. BMC SYSTEMS BIOLOGY 2012; 6:64. [PMID: 22691569 PMCID: PMC3443057 DOI: 10.1186/1752-0509-6-64] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/06/2012] [Accepted: 06/12/2012] [Indexed: 02/04/2023]
Abstract
BACKGROUND Identification of driver mutations among numerous genomic alternations remains a critical challenge to the elucidation of the underlying mechanisms of cancer. Because driver mutations by definition are associated with a greater number of cancer phenotypes compared to other mutations, we hypothesized that driver mutations could more easily be identified once the genotype-phenotype correlations are detected across tumor samples. RESULTS In this study, we describe a novel network analysis to identify the driver mutation through integrating both cancer genomes and transcriptomes. Our method successfully identified a significant genotype-phenotype change correlation in all six solid tumor types and revealed core modules that contain both significantly enriched somatic mutations and aberrant expression changes specific to tumor development. Moreover, we found that the majority of these core modules contained well known cancer driver mutations, and that their mutated genes tended to occur at hub genes with central regulatory roles. In these mutated genes, the majority were cancer-type specific and exhibited a closer relationship within the same cancer type rather than across cancer types. The remaining mutated genes that exist in multiple cancer types led to two cancer type clusters, one cluster consisted of three neural derived or related cancer types, and the other cluster consisted of two adenoma cancer types. CONCLUSIONS Our approach can successfully identify the candidate drivers from the core modules. Comprehensive network analysis on the core modules potentially provides critical insights into convergent cancer development in different organs.
Collapse
Affiliation(s)
- Wenting Li
- MOE Key Laboratory of Bioinformatics, State Key Laboratory of Biomembrane and Membrane Biotechnology, Institute of Bioinformatics and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, China
| | | | | | | | | |
Collapse
|
39
|
Abstract
Sudden cardiac death (SCD), a sudden pulseless condition due to cardiac arrhythmia, remains a major public health problem despite recent progress in the treatment and prevention of overall coronary heart disease. In this review, we examine the evidence for genetic susceptibility to SCD in order to provide biological insight into the pathogenesis of this devastating disease and to explore the potential for genetics to impact clinical management of SCD risk. Both candidate gene approaches and unbiased genome-wide scans have identified novel biological pathways contributing to SCD risk. Although risk stratification in the general population remains an elusive goal, several studies point to the potential utility of these common genetic variants in high-risk individuals. Finally, we highlight novel methodological approaches to deciphering the molecular mechanisms involved in arrhythmogenesis. Although further epidemiological and clinical applications research is needed, it is increasingly clear that genetic approaches are yielding important insights into SCD that may impact the public health burden imposed by SCD and its associated outcomes.
Collapse
Affiliation(s)
- Dan E Arking
- McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, MD 21209, USA.
| | | |
Collapse
|
40
|
Doncheva NT, Kacprowski T, Albrecht M. Recent approaches to the prioritization of candidate disease genes. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2012; 4:429-42. [PMID: 22689539 DOI: 10.1002/wsbm.1177] [Citation(s) in RCA: 45] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Many efforts are still devoted to the discovery of genes involved with specific phenotypes, in particular, diseases. High-throughput techniques are thus applied frequently to detect dozens or even hundreds of candidate genes. However, the experimental validation of many candidates is often an expensive and time-consuming task. Therefore, a great variety of computational approaches has been developed to support the identification of the most promising candidates for follow-up studies. The biomedical knowledge already available about the disease of interest and related genes is commonly exploited to find new gene-disease associations and to prioritize candidates. In this review, we highlight recent methodological advances in this research field of candidate gene prioritization. We focus on approaches that use network information and integrate heterogeneous data sources. Furthermore, we discuss current benchmarking procedures for evaluating and comparing different prioritization methods.
Collapse
|
41
|
Le DH, Kwon YK. GPEC: a Cytoscape plug-in for random walk-based gene prioritization and biomedical evidence collection. Comput Biol Chem 2012; 37:17-23. [PMID: 22430954 DOI: 10.1016/j.compbiolchem.2012.02.004] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/25/2011] [Revised: 01/10/2012] [Accepted: 02/20/2012] [Indexed: 11/18/2022]
Abstract
Finding genes associated with a disease is an important issue in the biomedical area and many gene prioritization methods have been proposed for this goal. Among these, network-based approaches are recently proposed and outperformed functional annotation-based ones. Here, we introduce a novel Cytoscape plug-in, GPEC, to help identify putative genes likely to be associated with specific diseases or pathways. In the plug-in, gene prioritization is performed through a random walk with restart algorithm, a state-of-the art network-based method, along with a gene/protein relationship network. The plug-in also allows users efficiently collect biomedical evidence for highly ranked candidate genes. A set of known genes, candidate genes and a gene/protein relationship network can be provided in a flexible way.
Collapse
Affiliation(s)
- Duc-Hau Le
- School of Computer Science and Engineering, Water Resources University, 175 Tay Son, Dong Da, Hanoi, Vietnam.
| | | |
Collapse
|
42
|
Cun Y, Fröhlich H. Biomarker gene signature discovery integrating network knowledge. BIOLOGY 2012; 1:5-17. [PMID: 24832044 PMCID: PMC4011032 DOI: 10.3390/biology1010005] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/29/2012] [Revised: 02/18/2012] [Accepted: 02/21/2012] [Indexed: 12/17/2022]
Abstract
Discovery of prognostic and diagnostic biomarker gene signatures for diseases, such as cancer, is seen as a major step towards a better personalized medicine. During the last decade various methods, mainly coming from the machine learning or statistical domain, have been proposed for that purpose. However, one important obstacle for making gene signatures a standard tool in clinical diagnosis is the typical low reproducibility of these signatures combined with the difficulty to achieve a clear biological interpretation. For that purpose in the last years there has been a growing interest in approaches that try to integrate information from molecular interaction networks. Here we review the current state of research in this field by giving an overview about so-far proposed approaches.
Collapse
Affiliation(s)
- Yupeng Cun
- Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany.
| | - Holger Fröhlich
- Bonn-Aachen International Center for IT (B-IT), Dahlmannstr. 2, 53113 Bonn, Germany.
| |
Collapse
|
43
|
Piro RM, Di Cunto F. Computational approaches to disease-gene prediction: rationale, classification and successes. FEBS J 2012; 279:678-96. [PMID: 22221742 DOI: 10.1111/j.1742-4658.2012.08471.x] [Citation(s) in RCA: 92] [Impact Index Per Article: 7.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022]
Abstract
The identification of genes involved in human hereditary diseases often requires the time-consuming and expensive examination of a great number of possible candidate genes, since genome-wide techniques such as linkage analysis and association studies frequently select many hundreds of 'positional' candidates. Even considering the positive impact of next-generation sequencing technologies, the prioritization of candidate genes may be an important step for disease-gene identification. In this paper we develop a basic classification scheme for computational approaches to disease-gene prediction and apply it to exhaustively review bioinformatics tools that have been developed for this purpose, focusing on conceptual aspects rather than technical detail and performance. Finally, we discuss some past successes obtained by computational approaches to illustrate their beneficial contribution to medical research.
Collapse
Affiliation(s)
- Rosario M Piro
- Department of Theoretical Bioinformatics, German Cancer Research Center, (DKFZ), Heidelberg, Germany.
| | | |
Collapse
|
44
|
Zhao J, Yang TH, Huang Y, Holme P. Ranking candidate disease genes from gene expression and protein interaction: a Katz-centrality based approach. PLoS One 2011; 6:e24306. [PMID: 21912686 PMCID: PMC3166320 DOI: 10.1371/journal.pone.0024306] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2011] [Accepted: 08/04/2011] [Indexed: 11/29/2022] Open
Abstract
Many diseases have complex genetic causes, where a set of alleles can affect the propensity of getting the disease. The identification of such disease genes is important to understand the mechanistic and evolutionary aspects of pathogenesis, improve diagnosis and treatment of the disease, and aid in drug discovery. Current genetic studies typically identify chromosomal regions associated specific diseases. But picking out an unknown disease gene from hundreds of candidates located on the same genomic interval is still challenging. In this study, we propose an approach to prioritize candidate genes by integrating data of gene expression level, protein-protein interaction strength and known disease genes. Our method is based only on two, simple, biologically motivated assumptions—that a gene is a good disease-gene candidate if it is differentially expressed in cases and controls, or that it is close to other disease-gene candidates in its protein interaction network. We tested our method on 40 diseases in 58 gene expression datasets of the NCBI Gene Expression Omnibus database. On these datasets our method is able to predict unknown disease genes as well as identifying pleiotropic genes involved in the physiological cellular processes of many diseases. Our study not only provides an effective algorithm for prioritizing candidate disease genes but is also a way to discover phenotypic interdependency, cooccurrence and shared pathophysiology between different disorders.
Collapse
Affiliation(s)
- Jing Zhao
- Department of Mathematics, Logistical Engineering University, Chongqing, China.
| | | | | | | |
Collapse
|
45
|
Nitsch D, Tranchevent LC, Gonçalves JP, Vogt JK, Madeira SC, Moreau Y. PINTA: a web server for network-based gene prioritization from expression data. Nucleic Acids Res 2011; 39:W334-8. [PMID: 21602267 PMCID: PMC3125740 DOI: 10.1093/nar/gkr289] [Citation(s) in RCA: 58] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/06/2023] Open
Abstract
PINTA (available at http://www.esat.kuleuven.be/pinta/; this web site is free and open to all users and there is no login requirement) is a web resource for the prioritization of candidate genes based on the differential expression of their neighborhood in a genome-wide protein–protein interaction network. Our strategy is meant for biological and medical researchers aiming at identifying novel disease genes using disease specific expression data. PINTA supports both candidate gene prioritization (starting from a user defined set of candidate genes) as well as genome-wide gene prioritization and is available for five species (human, mouse, rat, worm and yeast). As input data, PINTA only requires disease specific expression data, whereas various platforms (e.g. Affymetrix) are supported. As a result, PINTA computes a gene ranking and presents the results as a table that can easily be browsed and downloaded by the user.
Collapse
Affiliation(s)
- Daniela Nitsch
- Department of Electrical Engineering (ESAT-SCD), Katholieke Universiteit Leuven, 3001 Leuven, Belgium
| | | | | | | | | | | |
Collapse
|
46
|
Glaab E, Baudot A, Krasnogor N, Valencia A. Extending pathways and processes using molecular interaction networks to analyse cancer genome data. BMC Bioinformatics 2010; 11:597. [PMID: 21144022 PMCID: PMC3017081 DOI: 10.1186/1471-2105-11-597] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2010] [Accepted: 12/13/2010] [Indexed: 12/31/2022] Open
Abstract
Background Cellular processes and pathways, whose deregulation may contribute to the development of cancers, are often represented as cascades of proteins transmitting a signal from the cell surface to the nucleus. However, recent functional genomic experiments have identified thousands of interactions for the signalling canonical proteins, challenging the traditional view of pathways as independent functional entities. Combining information from pathway databases and interaction networks obtained from functional genomic experiments is therefore a promising strategy to obtain more robust pathway and process representations, facilitating the study of cancer-related pathways. Results We present a methodology for extending pre-defined protein sets representing cellular pathways and processes by mapping them onto a protein-protein interaction network, and extending them to include densely interconnected interaction partners. The added proteins display distinctive network topological features and molecular function annotations, and can be proposed as putative new components, and/or as regulators of the communication between the different cellular processes. Finally, these extended pathways and processes are used to analyse their enrichment in pancreatic mutated genes. Significant associations between mutated genes and certain processes are identified, enabling an analysis of the influence of previously non-annotated cancer mutated genes. Conclusions The proposed method for extending cellular pathways helps to explain the functions of cancer mutated genes by exploiting the synergies of canonical knowledge and large-scale interaction data.
Collapse
|
47
|
Nitsch D, Gonçalves JP, Ojeda F, de Moor B, Moreau Y. Candidate gene prioritization by network analysis of differential expression using machine learning approaches. BMC Bioinformatics 2010; 11:460. [PMID: 20840752 PMCID: PMC2945940 DOI: 10.1186/1471-2105-11-460] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2010] [Accepted: 09/14/2010] [Indexed: 02/02/2023] Open
Abstract
Background Discovering novel disease genes is still challenging for diseases for which no prior knowledge - such as known disease genes or disease-related pathways - is available. Performing genetic studies frequently results in large lists of candidate genes of which only few can be followed up for further investigation. We have recently developed a computational method for constitutional genetic disorders that identifies the most promising candidate genes by replacing prior knowledge by experimental data of differential gene expression between affected and healthy individuals. To improve the performance of our prioritization strategy, we have extended our previous work by applying different machine learning approaches that identify promising candidate genes by determining whether a gene is surrounded by highly differentially expressed genes in a functional association or protein-protein interaction network. Results We have proposed three strategies scoring disease candidate genes relying on network-based machine learning approaches, such as kernel ridge regression, heat kernel, and Arnoldi kernel approximation. For comparison purposes, a local measure based on the expression of the direct neighbors is also computed. We have benchmarked these strategies on 40 publicly available knockout experiments in mice, and performance was assessed against results obtained using a standard procedure in genetics that ranks candidate genes based solely on their differential expression levels (Simple Expression Ranking). Our results showed that our four strategies could outperform this standard procedure and that the best results were obtained using the Heat Kernel Diffusion Ranking leading to an average ranking position of 8 out of 100 genes, an AUC value of 92.3% and an error reduction of 52.8% relative to the standard procedure approach which ranked the knockout gene on average at position 17 with an AUC value of 83.7%. Conclusion In this study we could identify promising candidate genes using network based machine learning approaches even if no knowledge is available about the disease or phenotype.
Collapse
Affiliation(s)
- Daniela Nitsch
- Department of Electrical Engineering (ESAT-SCD) Katholieke Universiteit Leuven, 3001 Leuven, Belgium.
| | | | | | | | | |
Collapse
|
48
|
Tranchevent LC, Capdevila FB, Nitsch D, De Moor B, De Causmaecker P, Moreau Y. A guide to web tools to prioritize candidate genes. Brief Bioinform 2010; 12:22-32. [PMID: 21278374 DOI: 10.1093/bib/bbq007] [Citation(s) in RCA: 141] [Impact Index Per Article: 10.1] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/13/2023] Open
|
49
|
Clermont G, Auffray C, Moreau Y, Rocke DM, Dalevi D, Dubhashi D, Marshall DR, Raasch P, Dehne F, Provero P, Tegner J, Aronow BJ, Langston MA, Benson M. Bridging the gap between systems biology and medicine. Genome Med 2009; 1:88. [PMID: 19754960 PMCID: PMC2768995 DOI: 10.1186/gm88] [Citation(s) in RCA: 58] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2009] [Revised: 06/11/2009] [Accepted: 09/15/2009] [Indexed: 11/10/2022] Open
Abstract
Systems biology has matured considerably as a discipline over the last decade, yet some of the key challenges separating current research efforts in systems biology and clinically useful results are only now becoming apparent. As these gaps are better defined, the new discipline of systems medicine is emerging as a translational extension of systems biology. How is systems medicine defined? What are relevant ontologies for systems medicine? What are the key theoretic and methodologic challenges facing computational disease modeling? How are inaccurate and incomplete data, and uncertain biologic knowledge best synthesized in useful computational models? Does network analysis provide clinically useful insight? We discuss the outstanding difficulties in translating a rapidly growing body of data into knowledge usable at the bedside. Although core-specific challenges are best met by specialized groups, it appears fundamental that such efforts should be guided by a roadmap for systems medicine drafted by a coalition of scientists from the clinical, experimental, computational, and theoretic domains.
Collapse
Affiliation(s)
- Gilles Clermont
- Department of Critical Care Medicine and CRISMA laboratory, University of Pittsburgh School of Medicine, Scaife 602, 3550 Terrace, Pittsburgh, PA 15261, USA.
| | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|