1
|
Baluszek S, Kober P, Rusetska N, Wągrodzki M, Mandat T, Kunicki J, Bujko M. DNA methylation, combined with RNA sequencing, provide novel insight into molecular classification of chordomas and their microenvironment. Acta Neuropathol Commun 2023; 11:113. [PMID: 37434245 DOI: 10.1186/s40478-023-01610-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/25/2023] [Accepted: 06/26/2023] [Indexed: 07/13/2023] Open
Abstract
Chordomas are rare tumors of notochord remnants, occurring mainly in the sacrum and skull base. Despite of their unusually slow growth, chordomas are highly invasive and the involvement of adjacent critical structures causes treatment challenges. Due to the low incidence, the molecular pathogenesis of this entity remains largely unknown. This study aimed to investigate DNA methylation abnormalities and their impact on gene expression profiles in skull base chordomas. 32 tumor and 4 normal nucleus pulposus samples were subjected to DNA methylation and gene expression profiling with methylation microarrays and RNA sequencing. Genome-wide DNA methylation analysis revealed two distinct clusters for chordoma (termed subtypes C and I) with different patterns of aberrant DNA methylation. C Chordomas were characterized by general hypomethylation with hypermethylation of CpG islands, while I chordomas were generally hypermethylated. These differences were reflected by distinct distribution of differentially methylated probes (DMPs). Differentially methylated regions (DMRs) were identified, indicating aberrant methylation in known tumor-related genes in booth chordoma subtypes and regions encoding small RNAs in subtype C chordomas. Correlation between methylation and expression was observed in a minority of genes. Upregulation of TBXT in chordomas appeared to be related to lower methylation of tumor-specific DMR in gene promoter. Gene expression-based clusters of tumor samples did not overlap with DNA methylation-based subtypes. Nevertheless, they differ in transcriptomic profile that shows immune infiltration in I chordomas and up-regulation of cell cycle in C chordomas. Immune enrichment in chordomas I was confirmed with 3 independent deconvolution methods and immunohistochemistry. Copy number analysis showed higher chromosomal instability in C chordomas. Nine out of eight had deletion of CDKN2A/B loci and downregulation of genes encoded in related chromosomal band. No significant difference in patients' survival was observed between tumor subtypes, however, shorter survival was observed in patients with higher number of copy number alterations.
Collapse
Affiliation(s)
- Szymon Baluszek
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Paulina Kober
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Natalia Rusetska
- Department of Experimental Immunotherapy, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Michał Wągrodzki
- Department of Cancer Pathomorphology, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Tomasz Mandat
- Department of Neurosurgery, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Jacek Kunicki
- Department of Neurosurgery, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland
| | - Mateusz Bujko
- Department of Molecular and Translational Oncology, Maria Sklodowska-Curie National Research Institute of Oncology, Warsaw, Poland.
| |
Collapse
|
2
|
Franchini M, Pellecchia S, Viscido G, Gambardella G. Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data. NAR Genom Bioinform 2023; 5:lqad024. [PMID: 36879897 PMCID: PMC9985338 DOI: 10.1093/nargab/lqad024] [Citation(s) in RCA: 3] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2022] [Revised: 01/16/2023] [Accepted: 02/20/2023] [Indexed: 03/07/2023] Open
Abstract
Although an essential step, cell functional annotation often proves particularly challenging from single-cell transcriptional data. Several methods have been developed to accomplish this task. However, in most cases, these rely on techniques initially developed for bulk RNA sequencing or simply make use of marker genes identified from cell clustering followed by supervised annotation. To overcome these limitations and automatize the process, we have developed two novel methods, the single-cell gene set enrichment analysis (scGSEA) and the single-cell mapper (scMAP). scGSEA combines latent data representations and gene set enrichment scores to detect coordinated gene activity at single-cell resolution. scMAP uses transfer learning techniques to re-purpose and contextualize new cells into a reference cell atlas. Using both simulated and real datasets, we show that scGSEA effectively recapitulates recurrent patterns of pathways' activity shared by cells from different experimental conditions. At the same time, we show that scMAP can reliably map and contextualize new single-cell profiles on a breast cancer atlas we recently released. Both tools are provided in an effective and straightforward workflow providing a framework to determine cell function and significantly improve annotation and interpretation of scRNA-seq data.
Collapse
Affiliation(s)
- Melania Franchini
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Electrical Engineering and Information Technologies, University of Naples Federico II, 80125 Naples, Italy
| | - Simona Pellecchia
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gaetano Viscido
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy
| | - Gennaro Gambardella
- Telethon Institute of Genetics and Medicine, Pozzuoli 80078 Naples, Italy.,Department of Chemical Materials and Industrial Engineering, University of Naples Federico II, 80125 Naples, Italy
| |
Collapse
|
3
|
Jiang Y, Huang J, Tian K, Yi X, Zheng H, Zhu Y, Guo T, Ji X. Cross-regulome profiling of RNA polymerases highlights the regulatory role of polymerase III on mRNA transcription by maintaining local chromatin architecture. Genome Biol 2022; 23:246. [PMID: 36443871 PMCID: PMC9703767 DOI: 10.1186/s13059-022-02812-w] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/19/2022] [Accepted: 11/07/2022] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND Mammalian cells have three types of RNA polymerases (Pols), Pol I, II, and III. However, the extent to which these polymerases are cross-regulated and the underlying mechanisms remain unclear. RESULTS We employ genome-wide profiling after acute depletion of Pol I, Pol II, or Pol III to assess cross-regulatory effects between these Pols. We find that these enzymes mainly affect the transcription of their own target genes, while certain genes are transcribed by the other polymerases. Importantly, the most active type of crosstalk is exemplified by the fact that Pol III depletion affects Pol II transcription. Pol II genes with transcription changes upon Pol III depletion are enriched in diverse cellular functions, and Pol III binding sites are found near their promoters. However, these Pol III binding sites do not correspond to transfer RNAs. Moreover, we demonstrate that Pol III regulates Pol II transcription and chromatin binding of the facilitates chromatin transcription (FACT) complex to alter local chromatin structures, which in turn affects the Pol II transcription rate. CONCLUSIONS Our results support a model suggesting that RNA polymerases show cross-regulatory effects: Pol III affects local chromatin structures and the FACT-Pol II axis to regulate the Pol II transcription rate at certain gene loci. This study provides a new perspective for understanding the dysregulation of Pol III in various tissues affected by developmental diseases.
Collapse
Affiliation(s)
- Yongpeng Jiang
- Key Laboratory of Cell Proliferation and Differentiation of the Ministry of Education, School of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Jie Huang
- Key Laboratory of Cell Proliferation and Differentiation of the Ministry of Education, School of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Kai Tian
- Key Laboratory of Cell Proliferation and Differentiation of the Ministry of Education, School of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Xiao Yi
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd, Hangzhou, 310024, China
| | - Haonan Zheng
- Key Laboratory of Cell Proliferation and Differentiation of the Ministry of Education, School of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China
| | - Yi Zhu
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd, Hangzhou, 310024, China
| | - Tiannan Guo
- Westlake Laboratory of Life Sciences and Biomedicine, Key Laboratory of Structural Biology of Zhejiang Province, School of Life Sciences, Westlake University, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China
- Institute of Basic Medical Sciences, Westlake Institute for Advanced Study, 18 Shilongshan Road, Hangzhou, 310024, Zhejiang Province, China
- Westlake Omics (Hangzhou) Biotechnology Co., Ltd, Hangzhou, 310024, China
| | - Xiong Ji
- Key Laboratory of Cell Proliferation and Differentiation of the Ministry of Education, School of Life Sciences, Peking-Tsinghua Center for Life Sciences, Peking University, Beijing, 100871, China.
| |
Collapse
|
4
|
de Jong A, Kuipers OP, Kok J. FUNAGE-Pro: comprehensive web server for gene set enrichment analysis of prokaryotes. Nucleic Acids Res 2022; 50:W330-W336. [PMID: 35641095 PMCID: PMC9252808 DOI: 10.1093/nar/gkac441] [Citation(s) in RCA: 22] [Impact Index Per Article: 11.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/27/2022] [Revised: 04/20/2022] [Accepted: 05/10/2022] [Indexed: 12/11/2022] Open
Abstract
Recent advances in the field of high throughput (meta-)transcriptomics and proteomics call for easy and rapid methods enabling to explore not only single genes or proteins but also extended biological systems. Gene set enrichment analysis is commonly used to find relations in a set of genes and helps to uncover the biological meaning in results derived from high-throughput data. The basis for gene set enrichment analysis is a solid functional classification of genes. Here, we describe a comprehensive database containing multiple functional classifications of genes of all (>55 000) publicly available complete bacterial genomes. In addition to the most common functional classes such as COG and GO, also KEGG, InterPro, PFAM, eggnog and operon classes are supported. As classification data for features is often not available, we offer fast annotation and classification of proteins in any newly sequenced bacterial genome. The web server FUNAGE-Pro enables fast functional analysis on single gene sets, multiple experiments, time series data, clusters, and gene network modules for any prokaryote species or strain. FUNAGE-Pro is freely available at http://funagepro.molgenrug.nl.
Collapse
Affiliation(s)
- Anne de Jong
- Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, the Netherlands
| | - Oscar P Kuipers
- Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, the Netherlands
| | - Jan Kok
- Department of Molecular Genetics, University of Groningen, Groningen Biomolecular Sciences and Biotechnology Institute, the Netherlands
| |
Collapse
|
5
|
Van Buren E, Hu M, Cheng L, Wrobel J, Wilhelmsen K, Su L, Li Y, Wu D. TWO-SIGMA-G: a new competitive gene set testing framework for scRNA-seq data accounting for inter-gene and cell-cell correlation. Brief Bioinform 2022; 23:bbac084. [PMID: 35325048 PMCID: PMC9271221 DOI: 10.1093/bib/bbac084] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2021] [Revised: 02/09/2022] [Accepted: 02/17/2022] [Indexed: 11/14/2022] Open
Abstract
We propose TWO-SIGMA-G, a competitive gene set test for scRNA-seq data. TWO-SIGMA-G uses a mixed-effects regression model based on our previously published TWO-SIGMA to test for differential expression at the gene-level. This regression-based model provides flexibility and rigor at the gene-level in (1) handling complex experimental designs, (2) accounting for the correlation between biological replicates and (3) accommodating the distribution of scRNA-seq data to improve statistical inference. Moreover, TWO-SIGMA-G uses a novel approach to adjust for inter-gene-correlation (IGC) at the set-level to control the set-level false positive rate. Simulations demonstrate that TWO-SIGMA-G preserves type-I error and increases power in the presence of IGC compared with other methods. Application to two datasets identified HIV-associated interferon pathways in xenograft mice and pathways associated with Alzheimer's disease progression in humans.
Collapse
Affiliation(s)
- Eric Van Buren
- Department of Biostatistics, Harvard T.H. Chan School of Public Health
| | - Ming Hu
- Department of Quantitative Health Sciences, Lerner Research Institute, Cleveland Clinic Foundation
| | - Liang Cheng
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill
- Frontier Science Center for Immunology and Metabolism, Medical Research Institute, Wuhan University
| | - John Wrobel
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill
| | - Kirk Wilhelmsen
- Departments of Genetics and Neurology, Renaissance Computing Institute, University of North Carolina at Chapel Hill
| | - Lishan Su
- Lineberger Comprehensive Cancer Center, The University of North Carolina at Chapel Hill
- Department of Microbiology and Immunology, The University of North Carolina at Chapel Hill
- Departments of Pharmacology, Microbiology & Immunology University of Maryland School of Medicine
| | - Yun Li
- Department of Biostatistics, The University of North Carolina at Chapel Hill
- Department of Genetics, The University of North Carolina at Chapel Hill
- Department of Computer Science, The University of North Carolina at Chapel Hill
| | - Di Wu
- Department of Biostatistics, The University of North Carolina at Chapel Hill
- Department of Computer Science, The University of North Carolina at Chapel Hill
| |
Collapse
|
6
|
Mukhopadhyay S, Sinha S, Mohapatra SK. Analysis of transcriptomic data sets supports the role of IL-6 in NETosis and immunothrombosis in severe COVID-19. BMC Genom Data 2021; 22:49. [PMID: 34775962 PMCID: PMC8590626 DOI: 10.1186/s12863-021-01001-1] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2021] [Accepted: 10/12/2021] [Indexed: 12/14/2022] Open
Abstract
Background There is an urgent need to understand the key events driving pathogenesis of severe COVID-19 disease, so that precise treatment can be instituted. In this respect NETosis is gaining increased attention in the scientific community, as an important pathological process contributing to mortality. We sought to test if indeed there exists robust evidence of NETosis in multiple transcriptomic data sets from human subjects with severe COVID-19 disease. Gene set enrichment analysis was performed to test for up-regulation of gene set functional in NETosis in the blood of patients with COVID-19 illness. Results Blood gene expression functional in NETosis increased with severity of illness, showed negative correlation with blood oxygen saturation, and was validated in the lung of COVID-19 non-survivors. Temporal expression of IL-6 was compared between severe and moderate illness with COVID-19. Unsupervised clustering was performed to reveal co-expression of IL-6 with complement genes. In severe COVID-19 illness, there is transcriptional evidence of activation of NETosis, complement and coagulation cascade, and negative correlation between NETosis and respiratory function (oxygen saturation). An early spike in IL-6 is observed in severe COVID-19 illness that is correlated with complement activation. Conclusions Based on the transcriptional dynamics of IL-6 expression and its downstream effect on complement activation, we constructed a model that links early spike in IL-6 level with persistent and self-perpetuating complement activation, NETosis, immunothrombosis and respiratory dysfunction. Our model supports the early initiation of anti-IL6 therapy in severe COVID-19 disease before the life-threatening complications of the disease can perpetuate themselves autonomously. Supplementary Information The online version contains supplementary material available at 10.1186/s12863-021-01001-1.
Collapse
Affiliation(s)
| | - Subrata Sinha
- Department of Biochemistry, All India Institute of Medical Sciences, New Delhi, 110029, India
| | | |
Collapse
|
7
|
Integrative differential expression and gene set enrichment analysis using summary statistics for scRNA-seq studies. Nat Commun 2020; 11:1585. [PMID: 32221292 PMCID: PMC7101316 DOI: 10.1038/s41467-020-15298-6] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/13/2019] [Accepted: 03/02/2020] [Indexed: 01/28/2023] Open
Abstract
Differential expression (DE) analysis and gene set enrichment (GSE) analysis are commonly applied in single cell RNA sequencing (scRNA-seq) studies. Here, we develop an integrative and scalable computational method, iDEA, to perform joint DE and GSE analysis through a hierarchical Bayesian framework. By integrating DE and GSE analyses, iDEA can improve the power and consistency of DE analysis and the accuracy of GSE analysis. Importantly, iDEA uses only DE summary statistics as input, enabling effective data modeling through complementing and pairing with various existing DE methods. We illustrate the benefits of iDEA with extensive simulations. We also apply iDEA to analyze three scRNA-seq data sets, where iDEA achieves up to five-fold power gain over existing GSE methods and up to 64% power gain over existing DE methods. The power gain brought by iDEA allows us to identify many pathways that would not be identified by existing approaches in these data. Differential expression (DE) and gene set enrichment (GSE) analysis tend to be carried out separately. Here, the authors present iDEA (integrative Differential expression and gene set Enrichment Analysis) for the analysis of scRNAseq data which uses a Baysian approach to jointly model DE and GSE for improved power in both tasks.
Collapse
|
8
|
Meng Y, Cai XH, Wang L. Potential Genes and Pathways of Neonatal Sepsis Based on Functional Gene Set Enrichment Analyses. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2018; 2018:6708520. [PMID: 30154914 PMCID: PMC6091373 DOI: 10.1155/2018/6708520] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/18/2018] [Revised: 06/04/2018] [Accepted: 06/27/2018] [Indexed: 12/16/2022]
Abstract
BACKGROUND Neonatal sepsis (NS) is considered as the most common cause of neonatal deaths that newborns suffer from. Although numerous studies focus on gene biomarkers of NS, the predictive value of the gene biomarkers is low. NS pathogenesis is still needed to be investigated. METHODS After data preprocessing, we used KEGG enrichment method to identify the differentially expressed pathways between NS and normal controls. Then, functional principal component analysis (FPCA) was adopted to calculate gene values in NS. In order to further study the key signaling pathway of the NS, elastic-net regression model, Mann-Whitney U test, and coexpression network were used to estimate the weights of signaling pathway and hub genes. RESULTS A total of 115 different pathways between NS and controls were first identified. FPCA made full use of time-series gene expression information and estimated F values of genes in the different pathways. The top 1000 genes were considered as the different genes and were further analyzed by elastic-net regression and MWU test. There were 7 key signaling pathways between the NS and controls, according to different sources. Among those genes involved in key pathways, 7 hub genes, PIK3CA, TGFBR2, CDKN1B, KRAS, E2F3, TRAF6, and CHUK, were determined based on the coexpression network. Most of them were cancer-related genes. PIK3CA was considered as the common marker, which is highly expressed in the lymphocyte group. Little was known about the correlation of PIK3CA with NS, which gives us a new enlightenment for NS study. CONCLUSION This research might provide the perspective information to explore the potential novel genes and pathways as NS therapy targets.
Collapse
Affiliation(s)
- YuXiu Meng
- Department of Neonatology, First People's Hospital of Jining, Jining, Shandong 272000, China
| | - Xue Hong Cai
- Department of Pediatrics, Traditional Chinese Medicine Hospital of Yanzhou, Jining, Shandong 272100, China
| | - LiPei Wang
- Department of Neonatology, First People's Hospital of Jining, Jining, Shandong 272000, China
| |
Collapse
|
9
|
Zhang Y, Topham DJ, Thakar J, Qiu X. FUNNEL-GSEA: FUNctioNal ELastic-net regression in time-course gene set enrichment analysis. Bioinformatics 2018; 33:1944-1952. [PMID: 28334094 PMCID: PMC5939227 DOI: 10.1093/bioinformatics/btx104] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2016] [Accepted: 02/17/2017] [Indexed: 01/26/2023] Open
Abstract
Motivation Gene set enrichment analyses (GSEAs) are widely used in genomic research to identify underlying biological mechanisms (defined by the gene sets), such as Gene Ontology terms and molecular pathways. There are two caveats in the currently available methods: (i) they are typically designed for group comparisons or regression analyses, which do not utilize temporal information efficiently in time-series of transcriptomics measurements; and (ii) genes overlapping in multiple molecular pathways are considered multiple times in hypothesis testing. Results We propose an inferential framework for GSEA based on functional data analysis, which utilizes the temporal information based on functional principal component analysis, and disentangles the effects of overlapping genes by a functional extension of the elastic-net regression. Furthermore, the hypothesis testing for the gene sets is performed by an extension of Mann-Whitney U test which is based on weighted rank sums computed from correlated observations. By using both simulated datasets and a large-scale time-course gene expression data on human influenza infection, we demonstrate that our method has uniformly better receiver operating characteristic curves, and identifies more pathways relevant to immune-response to human influenza infection than the competing approaches. Availability and Implementation The methods are implemented in R package FUNNEL, freely and publicly available at: https://github.com/yunzhang813/FUNNEL-GSEA-R-Package. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yun Zhang
- Department of Biostatistics and Computational Biology
| | - David J Topham
- Department of Microbiology and Immunology, University of Rochester, Rochester, NY 14642, USA
| | - Juilee Thakar
- Department of Biostatistics and Computational Biology.,Department of Microbiology and Immunology, University of Rochester, Rochester, NY 14642, USA
| | - Xing Qiu
- Department of Biostatistics and Computational Biology
| |
Collapse
|
10
|
de Torrente L, Zimmerman S, Taylor D, Hasegawa Y, Wells CA, Mar JC. pathVar: a new method for pathway-based interpretation of gene expression variability. PeerJ 2017; 5:e3334. [PMID: 28560097 PMCID: PMC5444375 DOI: 10.7717/peerj.3334] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 04/18/2017] [Indexed: 12/31/2022] Open
Abstract
Identifying the pathways that control a cellular phenotype is the first step to building a mechanistic model. Recent examples in developmental biology, cancer genomics, and neurological disease have demonstrated how changes in the variability of gene expression can highlight important genes that are under different degrees of regulatory control. Simple statistical tests exist to identify differentially-variable genes; however, methods for investigating how changes in gene expression variability in the context of pathways and gene sets are under-explored. Here we present pathVar, a new method that provides functional interpretation of gene expression variability changes at the level of pathways and gene sets. pathVar is based on a multinomial exact test, or an asymptotic Chi-squared test as a more computationally-efficient alternative. The method can be used for gene expression studies from any technology platform in all biological settings either with a single phenotypic group, or two-group comparisons. To demonstrate its utility, we applied the method to a diverse set of diseases, species and samples. Results from pathVar are benchmarked against analyses based on average expression and two methods of GSEA, and demonstrate that analyses using both statistics are useful for understanding transcriptional regulation. We also provide recommendations for the choice of variability statistic that have been informed through analyses on simulations and real data. Based on the datasets selected, we show how pathVar can be used to gain insight into expression variability of single cell versus bulk samples, different stem cell populations, and cancer versus normal tissue comparisons.
Collapse
Affiliation(s)
- Laurence de Torrente
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Samuel Zimmerman
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Deanne Taylor
- Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA, United States of America.,Department of Pediatrics, University of Pennsylvania, Philadelphia, PA, United States of America
| | - Yu Hasegawa
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States of America
| | - Christine A Wells
- Department of Anatomy and Neuroscience, University of Melbourne, Melbourne, Victoria, Australia
| | - Jessica C Mar
- Department of Systems and Computational Biology, Albert Einstein College of Medicine, Bronx, NY, United States of America.,Department of Epidemiology and Population Health, Albert Einstein College of Medicine, Bronx, NY, United States of America.,University of Queensland, Australian Institute for Bioengineering and Nanotechnology, Brisbane, Queensland, Australia
| |
Collapse
|
11
|
Li Y, Morrow J, Raby B, Tantisira K, Weiss ST, Huang W, Qiu W. Detecting disease-associated genomic outcomes using constrained mixture of Bayesian hierarchical models for paired data. PLoS One 2017; 12:e0174602. [PMID: 28358896 PMCID: PMC5373614 DOI: 10.1371/journal.pone.0174602] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/14/2016] [Accepted: 03/10/2017] [Indexed: 12/02/2022] Open
Abstract
Detecting disease-associated genomic outcomes is one of the key steps in precision medicine research. Cutting-edge high-throughput technologies enable researchers to unbiasedly test if genomic outcomes are associated with disease of interest. However, these technologies also include the challenges associated with the analysis of genome-wide data. Two big challenges are (1) how to reduce the effects of technical noise; and (2) how to handle the curse of dimensionality (i.e., number of variables are way larger than the number of samples). To tackle these challenges, we propose a constrained mixture of Bayesian hierarchical models (MBHM) for detecting disease-associated genomic outcomes for data obtained from paired/matched designs. Paired/matched designs can effectively reduce effects of confounding factors. MBHM does not involve multiple testing, hence does not have the problem of the curse of dimensionality. It also could borrow information across genes so that it can be used for whole genome data with small sample sizes.
Collapse
Affiliation(s)
- Yunfeng Li
- School of Mathematical Sciences, Zhejiang University, HongZhou, Zhejiang, China
| | - Jarrett Morrow
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Benjamin Raby
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Kelan Tantisira
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Scott T. Weiss
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
| | - Wei Huang
- School of Mathematical Sciences, Zhejiang University, HongZhou, Zhejiang, China
| | - Weiliang Qiu
- Channing Division of Network Medicine, Brigham and Women’s Hospital/Harvard Medical School, Boston, MA, United States of America
- * E-mail:
| |
Collapse
|
12
|
Zeng Y, Breheny P. Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection. Cancer Inform 2016; 15:179-87. [PMID: 27679461 PMCID: PMC5026200 DOI: 10.4137/cin.s40043] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2016] [Revised: 08/07/2016] [Accepted: 08/13/2016] [Indexed: 11/24/2022] Open
Abstract
Discovering important genes that account for the phenotype of interest has long been a challenge in genome-wide expression analysis. Analyses such as gene set enrichment analysis (GSEA) that incorporate pathway information have become widespread in hypothesis testing, but pathway-based approaches have been largely absent from regression methods due to the challenges of dealing with overlapping pathways and the resulting lack of available software. The R package grpreg is widely used to fit group lasso and other group-penalized regression models; in this study, we develop an extension, grpregOverlap, to allow for overlapping group structure using a latent variable approach. We compare this approach to the ordinary lasso and to GSEA using both simulated and real data. We find that incorporation of prior pathway information can substantially improve the accuracy of gene expression classifiers, and we shed light on several ways in which hypothesis-testing approaches such as GSEA differ from regression approaches with respect to the analysis of pathway data.
Collapse
Affiliation(s)
- Yaohui Zeng
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| | - Patrick Breheny
- Department of Biostatistics, University of Iowa, Iowa City, IA, USA
| |
Collapse
|
13
|
Cloonan SM, Glass K, Laucho-Contreras ME, Bhashyam AR, Cervo M, Pabón MA, Konrad C, Polverino F, Siempos II, Perez E, Mizumura K, Ghosh MC, Parameswaran H, Williams NC, Rooney KT, Chen ZH, Goldklang MP, Yuan GC, Moore SC, Demeo DL, Rouault TA, D’Armiento JM, Schon EA, Manfredi G, Quackenbush J, Mahmood A, Silverman EK, Owen CA, Choi AM. Mitochondrial iron chelation ameliorates cigarette smoke-induced bronchitis and emphysema in mice. Nat Med 2016; 22:163-74. [PMID: 26752519 PMCID: PMC4742374 DOI: 10.1038/nm.4021] [Citation(s) in RCA: 179] [Impact Index Per Article: 22.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2015] [Accepted: 12/01/2015] [Indexed: 12/20/2022]
Abstract
Chronic obstructive pulmonary disease (COPD) is linked to both cigarette smoking and genetic determinants. We have previously identified iron-responsive element-binding protein 2 (IRP2) as an important COPD susceptibility gene and have shown that IRP2 protein is increased in the lungs of individuals with COPD. Here we demonstrate that mice deficient in Irp2 were protected from cigarette smoke (CS)-induced experimental COPD. By integrating RNA immunoprecipitation followed by sequencing (RIP-seq), RNA sequencing (RNA-seq), and gene expression and functional enrichment clustering analysis, we identified Irp2 as a regulator of mitochondrial function in the lungs of mice. Irp2 increased mitochondrial iron loading and levels of cytochrome c oxidase (COX), which led to mitochondrial dysfunction and subsequent experimental COPD. Frataxin-deficient mice, which had higher mitochondrial iron loading, showed impaired airway mucociliary clearance (MCC) and higher pulmonary inflammation at baseline, whereas mice deficient in the synthesis of cytochrome c oxidase, which have reduced COX, were protected from CS-induced pulmonary inflammation and impairment of MCC. Mice treated with a mitochondrial iron chelator or mice fed a low-iron diet were protected from CS-induced COPD. Mitochondrial iron chelation also alleviated CS-induced impairment of MCC, CS-induced pulmonary inflammation and CS-associated lung injury in mice with established COPD, suggesting a critical functional role and potential therapeutic intervention for the mitochondrial-iron axis in COPD.
Collapse
MESH Headings
- Aged
- Aged, 80 and over
- Airway Remodeling
- Animals
- Bronchitis/etiology
- Bronchitis/genetics
- Disease Models, Animal
- Electron Transport Complex IV/metabolism
- Electrophoretic Mobility Shift Assay
- Enzyme-Linked Immunosorbent Assay
- Flow Cytometry
- Gene Expression Profiling
- Humans
- Immunoblotting
- Immunohistochemistry
- Immunoprecipitation
- Iron/metabolism
- Iron Chelating Agents/pharmacology
- Iron Regulatory Protein 2/genetics
- Iron Regulatory Protein 2/metabolism
- Iron, Dietary
- Iron-Binding Proteins/genetics
- Lung/drug effects
- Lung/metabolism
- Lung Injury/etiology
- Lung Injury/genetics
- Membrane Potential, Mitochondrial
- Mice
- Mice, Knockout
- Microscopy, Confocal
- Microscopy, Electron, Transmission
- Microscopy, Fluorescence
- Mitochondria/drug effects
- Mitochondria/metabolism
- Mucociliary Clearance/genetics
- Pneumonia/etiology
- Pneumonia/genetics
- Pulmonary Disease, Chronic Obstructive/etiology
- Pulmonary Disease, Chronic Obstructive/genetics
- Pulmonary Disease, Chronic Obstructive/metabolism
- Pulmonary Emphysema/etiology
- Pulmonary Emphysema/genetics
- Real-Time Polymerase Chain Reaction
- Smoke/adverse effects
- Smoking/adverse effects
- Nicotiana
- Frataxin
Collapse
Affiliation(s)
- Suzanne M. Cloonan
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Kimberly Glass
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Maria E. Laucho-Contreras
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Abhiram R. Bhashyam
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Morgan Cervo
- Department of Radiology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Maria A. Pabón
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
| | - Csaba Konrad
- Brain and Mind Research Institute, Weill Cornell Medical College, New York, NY, USA
| | - Francesca Polverino
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Lovelace Respiratory Research institute, Albuquerque, NM, USA
- Pulmonary Department, University of Parma, Parma, Italy
| | - Ilias I. Siempos
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
- First Department of Critical Care Medicine and Pulmonary Services, Evangelismos Hospital, University of Athens, Medical School, Athens, Greece
| | - Elizabeth Perez
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
| | - Kenji Mizumura
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Manik C. Ghosh
- Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), Bethesda, MD, USA
| | | | - Niamh C. Williams
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
| | - Kristen T. Rooney
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
| | - Zhi-Hua Chen
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Department of Respiratory and Critical Care Medicine, Second Hospital of Zhejiang University School of Medicine, Hangzhou, China
| | - Monica P. Goldklang
- Department of Anesthesiology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
| | - Guo-Cheng Yuan
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Stephen C. Moore
- Department of Radiology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Dawn L. Demeo
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Tracey A. Rouault
- Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), Bethesda, MD, USA
| | - Jeanine M. D’Armiento
- Department of Anesthesiology, Columbia University, New York, NY, USA
- Department of Medicine, Columbia University, New York, NY, USA
- Department of Physiology & Cellular Biophysics, Columbia University, New York, NY, USA
| | - Eric A. Schon
- Department of Neurology, Columbia University Medical Center, New York, NY, USA
- Department of Genetics and Development, Columbia University Medical Center, New York, NY, USA
| | - Giovanni Manfredi
- Brain and Mind Research Institute, Weill Cornell Medical College, New York, NY, USA
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Ashfaq Mahmood
- Department of Radiology, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Edwin K. Silverman
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| | - Caroline A. Owen
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
- Lovelace Respiratory Research institute, Albuquerque, NM, USA
| | - Augustine M.K. Choi
- Joan and Sanford I. Weill Department of Medicine, New York-Presbyterian Hospital, Weill Cornell Medical College, New York, NY, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA, USA
| |
Collapse
|
14
|
Matone A, O'Grada CM, Dillon ET, Morris C, Ryan MF, Walsh M, Gibney ER, Brennan L, Gibney MJ, Morine MJ, Roche HM. Body mass index mediates inflammatory response to acute dietary challenges. Mol Nutr Food Res 2015; 59:2279-92. [DOI: 10.1002/mnfr.201500184] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2015] [Revised: 07/27/2015] [Accepted: 08/06/2015] [Indexed: 01/08/2023]
Affiliation(s)
- Alice Matone
- The Microsoft Research; University of Trento Centre for Computational Systems Biology (COSBI); Rovereto Italy
| | - Colm M. O'Grada
- Nutrigenomics Research Group; UCD Conway Institute of Biomolecular and Biomedical Research; School of Public Health and Population Science; University College Dublin; Belfield Dublin Ireland
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Eugene T. Dillon
- Nutrigenomics Research Group; UCD Conway Institute of Biomolecular and Biomedical Research; School of Public Health and Population Science; University College Dublin; Belfield Dublin Ireland
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Ciara Morris
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Miriam F. Ryan
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Marianne Walsh
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Eileen R. Gibney
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Lorraine Brennan
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research; University College Dublin; Belfield Dublin Ireland
| | - Michael J. Gibney
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
| | - Melissa J. Morine
- The Microsoft Research; University of Trento Centre for Computational Systems Biology (COSBI); Rovereto Italy
- Department of Mathematics; University of Trento; Trento Italy
| | - Helen M. Roche
- Nutrigenomics Research Group; UCD Conway Institute of Biomolecular and Biomedical Research; School of Public Health and Population Science; University College Dublin; Belfield Dublin Ireland
- Institute of Food and Health; University College Dublin; Belfield Dublin Ireland
- UCD Conway Institute of Biomolecular and Biomedical Research; University College Dublin; Belfield Dublin Ireland
| |
Collapse
|
15
|
Glaab E. Using prior knowledge from cellular pathways and molecular networks for diagnostic specimen classification. Brief Bioinform 2015; 17:440-52. [PMID: 26141830 PMCID: PMC4870394 DOI: 10.1093/bib/bbv044] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Indexed: 12/27/2022] Open
Abstract
For many complex diseases, an earlier and more reliable diagnosis is considered a key prerequisite for developing more effective therapies to prevent or delay disease progression. Classical statistical learning approaches for specimen classification using omics data, however, often cannot provide diagnostic models with sufficient accuracy and robustness for heterogeneous diseases like cancers or neurodegenerative disorders. In recent years, new approaches for building multivariate biomarker models on omics data have been proposed, which exploit prior biological knowledge from molecular networks and cellular pathways to address these limitations. This survey provides an overview of these recent developments and compares pathway- and network-based specimen classification approaches in terms of their utility for improving model robustness, accuracy and biological interpretability. Different routes to translate omics-based multifactorial biomarker models into clinical diagnostic tests are discussed, and a previous study is presented as example.
Collapse
|
16
|
Glass K, Quackenbush J, Spentzos D, Haibe-Kains B, Yuan GC. A network model for angiogenesis in ovarian cancer. BMC Bioinformatics 2015; 16:115. [PMID: 25888305 PMCID: PMC4408593 DOI: 10.1186/s12859-015-0551-y] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/27/2014] [Accepted: 03/25/2015] [Indexed: 12/31/2022] Open
Abstract
Background We recently identified two robust ovarian cancer subtypes, defined by the expression of genes involved in angiogenesis, with significant differences in clinical outcome. To identify potential regulatory mechanisms that distinguish the subtypes we applied PANDA, a method that uses an integrative approach to model information flow in gene regulatory networks. Results We find distinct differences between networks that are active in the angiogenic and non-angiogenic subtypes, largely defined by a set of key transcription factors that, although previously reported to play a role in angiogenesis, are not strongly differentially-expressed between the subtypes. Our network analysis indicates that these factors are involved in the activation (or repression) of different genes in the two subtypes, resulting in differential expression of their network targets. Mechanisms mediating differences between subtypes include a previously unrecognized pro-angiogenic role for increased genome-wide DNA methylation and complex patterns of combinatorial regulation. Conclusions The models we develop require a shift in our interpretation of the driving factors in biological networks away from the genes themselves and toward their interactions. The observed regulatory changes between subtypes suggest therapeutic interventions that may help in the treatment of ovarian cancer. Electronic supplementary material The online version of this article (doi:10.1186/s12859-015-0551-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Kimberly Glass
- Dana-Farber Cancer Institute, Boston, MA, USA. .,Harvard School of Public Health, Boston, MA, USA. .,Brigham and Women's Hospital, Boston, MA, USA.
| | - John Quackenbush
- Dana-Farber Cancer Institute, Boston, MA, USA. .,Harvard School of Public Health, Boston, MA, USA.
| | - Dimitrios Spentzos
- Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA.
| | - Benjamin Haibe-Kains
- Princess Margaret Cancer Center, University Health Network, Toronto, ON, M5G 2M9, Canada.
| | - Guo-Cheng Yuan
- Dana-Farber Cancer Institute, Boston, MA, USA. .,Harvard School of Public Health, Boston, MA, USA.
| |
Collapse
|
17
|
Cai B, Jiang X. Revealing Biological Pathways Implicated in Lung Cancer from TCGA Gene Expression Data Using Gene Set Enrichment Analysis. Cancer Inform 2014; 13:113-21. [PMID: 25520551 PMCID: PMC4251186 DOI: 10.4137/cin.s13882] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2014] [Revised: 09/05/2014] [Accepted: 09/09/2014] [Indexed: 12/11/2022] Open
Abstract
Analyzing biological system abnormalities in cancer patients based on measures of biological entities, such as gene expression levels, is an important and challenging problem. This paper applies existing methods, Gene Set Enrichment Analysis and Signaling Pathway Impact Analysis, to pathway abnormality analysis in lung cancer using microarray gene expression data. Gene expression data from studies of Lung Squamous Cell Carcinoma (LUSC) in The Cancer Genome Atlas project, and pathway gene set data from the Kyoto Encyclopedia of Genes and Genomes were used to analyze the relationship between pathways and phenotypes. Results, in the form of pathway rankings, indicate that some pathways may behave abnormally in LUSC. For example, both the cell cycle and viral carcinogenesis pathways ranked very high in LUSC. Furthermore, some pathways that are known to be associated with cancer, such as the p53 and the PI3K-Akt signal transduction pathways, were found to rank high in LUSC. Other pathways, such as bladder cancer and thyroid cancer pathways, were also ranked high in LUSC.
Collapse
Affiliation(s)
- Binghuang Cai
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| | - Xia Jiang
- Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA
| |
Collapse
|
18
|
Dørum G, Snipen L, Solheim M, Saebø S. Rotation gene set testing for longitudinal expression data. Biom J 2014; 56:1055-75. [DOI: 10.1002/bimj.201100178] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2011] [Revised: 11/11/2013] [Accepted: 07/01/2014] [Indexed: 12/17/2022]
Affiliation(s)
- Guro Dørum
- Department of Chemistry, Biotechnology and Food Science; Norwegian University of Life Sciences; N-1432 Aas Norway
| | - Lars Snipen
- Department of Chemistry, Biotechnology and Food Science; Norwegian University of Life Sciences; N-1432 Aas Norway
| | - Margrete Solheim
- Department of Chemistry, Biotechnology and Food Science; Norwegian University of Life Sciences; N-1432 Aas Norway
| | - Solve Saebø
- Department of Chemistry, Biotechnology and Food Science; Norwegian University of Life Sciences; N-1432 Aas Norway
| |
Collapse
|
19
|
Dennis G, Holweg CTJ, Kummerfeld SK, Choy DF, Setiadi AF, Hackney JA, Haverty PM, Gilbert H, Lin WY, Diehl L, Fischer S, Song A, Musselman D, Klearman M, Gabay C, Kavanaugh A, Endres J, Fox DA, Martin F, Townsend MJ. Synovial phenotypes in rheumatoid arthritis correlate with response to biologic therapeutics. Arthritis Res Ther 2014; 16:R90. [PMID: 25167216 PMCID: PMC4060385 DOI: 10.1186/ar4555] [Citation(s) in RCA: 251] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2013] [Accepted: 02/25/2014] [Indexed: 12/19/2022] Open
Abstract
INTRODUCTION Rheumatoid arthritis (RA) is a complex and clinically heterogeneous autoimmune disease. Currently, the relationship between pathogenic molecular drivers of disease in RA and therapeutic response is poorly understood. METHODS We analyzed synovial tissue samples from two RA cohorts of 49 and 20 patients using a combination of global gene expression, histologic and cellular analyses, and analysis of gene expression data from two further publicly available RA cohorts. To identify candidate serum biomarkers that correspond to differential synovial biology and clinical response to targeted therapies, we performed pre-treatment biomarker analysis compared with therapeutic outcome at week 24 in serum samples from 198 patients from the ADACTA (ADalimumab ACTemrA) phase 4 trial of tocilizumab (anti-IL-6R) monotherapy versus adalimumab (anti-TNFα) monotherapy. RESULTS We documented evidence for four major phenotypes of RA synovium - lymphoid, myeloid, low inflammatory, and fibroid - each with distinct underlying gene expression signatures. We observed that baseline synovial myeloid, but not lymphoid, gene signature expression was higher in patients with good compared with poor European league against rheumatism (EULAR) clinical response to anti-TNFα therapy at week 16 (P =0.011). We observed that high baseline serum soluble intercellular adhesion molecule 1 (sICAM1), associated with the myeloid phenotype, and high serum C-X-C motif chemokine 13 (CXCL13), associated with the lymphoid phenotype, had differential relationships with clinical response to anti-TNFα compared with anti-IL6R treatment. sICAM1-high/CXCL13-low patients showed the highest week 24 American College of Rheumatology (ACR) 50 response rate to anti-TNFα treatment as compared with sICAM1-low/CXCL13-high patients (42% versus 13%, respectively, P =0.05) while anti-IL-6R patients showed the opposite relationship with these biomarker subgroups (ACR50 20% versus 69%, P =0.004). CONCLUSIONS These data demonstrate that underlying molecular and cellular heterogeneity in RA impacts clinical outcome to therapies targeting different biological pathways, with patients with the myeloid phenotype exhibiting the most robust response to anti-TNFα. These data suggest a path to identify and validate serum biomarkers that predict response to targeted therapies in rheumatoid arthritis and possibly other autoimmune diseases. TRIAL REGISTRATION ClinicalTrials.gov NCT01119859
Collapse
|
20
|
Abstract
Motivation: Systems biology demands the use of several point of views to get a more comprehensive understanding of biological problems. This usually leads to take into account different data regarding the problem at hand, but it also has to do with using different perspectives of the same data. This multifaceted aspect of systems biology often requires the use of several tools, and it is often hard to get a seamless integration of all of them, which would help the analyst to have an interactive discourse with the data. Results: Focusing on expression profiling, BicOverlapper 2.0 visualizes the most relevant aspects of the analysis, including expression data, profiling analysis results and functional annotation. It also integrates several state-of-the-art numerical methods, such as differential expression analysis, gene set enrichment or biclustering. Availability and implementation: BicOverlapper 2.0 is available at: http://vis.usal.es/bicoverlapper2 Contact:rodri@usal.es
Collapse
Affiliation(s)
- Rodrigo Santamaría
- Department of Computer Science, University of Salamanca, 37008 Salamanca, Spain and Instituto de Biología Funcional y Genómica, CSIC/USAL, 37007 Salamanca, Spain
| | - Roberto Therón
- Department of Computer Science, University of Salamanca, 37008 Salamanca, Spain and Instituto de Biología Funcional y Genómica, CSIC/USAL, 37007 Salamanca, Spain
| | - Luis Quintales
- Department of Computer Science, University of Salamanca, 37008 Salamanca, Spain and Instituto de Biología Funcional y Genómica, CSIC/USAL, 37007 Salamanca, SpainDepartment of Computer Science, University of Salamanca, 37008 Salamanca, Spain and Instituto de Biología Funcional y Genómica, CSIC/USAL, 37007 Salamanca, Spain
| |
Collapse
|
21
|
McBride WJ, Kimpel MW, McClintick JN, Ding ZM, Hyytia P, Colombo G, Liang T, Edenberg HJ, Lumeng L, Bell RL. Gene expression within the extended amygdala of 5 pairs of rat lines selectively bred for high or low ethanol consumption. Alcohol 2013; 47:517-29. [PMID: 24157127 DOI: 10.1016/j.alcohol.2013.08.004] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2013] [Revised: 08/30/2013] [Accepted: 08/30/2013] [Indexed: 11/25/2022]
Abstract
The objectives of this study were to determine innate differences in gene expression in 2 regions of the extended amygdala between 5 different pairs of lines of male rats selectively bred for high or low ethanol consumption: a) alcohol-preferring (P) vs. alcohol-non-preferring (NP) rats, b) high-alcohol-drinking (HAD) vs. low-alcohol-drinking (LAD) rats (replicate line-pairs 1 and 2), c) ALKO alcohol (AA) vs. nonalcohol (ANA) rats, and d) Sardinian alcohol-preferring (sP) vs. Sardinian alcohol-nonpreferring (sNP) rats, and then to determine if these differences are common across the line-pairs. Microarray analysis revealed up to 1772 unique named genes in the nucleus accumbens shell (AcbSh) and 494 unique named genes in the central nucleus of the amygdala (CeA) that significantly differed [False Discovery Rate (FDR) = 0.10; fold-change at least 1.2] in expression between the individual line-pairs. Analysis using Gene Ontology (GO) and Ingenuity Pathways information indicated significant categories and networks in common for up to 3 or 4 line-pairs, but not for all 5 line-pairs. However, there were almost no individual genes in common within these categories and networks. ANOVAs of the combined data for the 5 line-pairs indicated 1014 and 731 significant (p < 0.01) differences in expression of named genes in the AcbSh and CeA, respectively. There were 4-6 individual named genes that significantly differed across up to 3 line-pairs in both regions; only 1 gene (Gsta4 in the CeA) differed in as many as 4 line-pairs. Overall, the findings suggest that a) some biological categories or networks (e.g., cell-to-cell signaling, cellular stress response, cellular organization, etc.) may be in common for subsets of line-pairs within either the AcbSh or CeA, and b) regulation of different genes and/or combinations of multiple biological systems may be contributing to the disparate alcohol drinking behaviors of these line-pairs.
Collapse
|
22
|
Huntley MA, Larson JL, Chaivorapol C, Becker G, Lawrence M, Hackney JA, Kaminker JS. ReportingTools: an automated result processing and presentation toolkit for high-throughput genomic analyses. ACTA ACUST UNITED AC 2013; 29:3220-1. [PMID: 24078713 DOI: 10.1093/bioinformatics/btt551] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022]
Abstract
UNLABELLED It is common for computational analyses to generate large amounts of complex data that are difficult to process and share with collaborators. Standard methods are needed to transform such data into a more useful and intuitive format. We present ReportingTools, a Bioconductor package, that automatically recognizes and transforms the output of many common Bioconductor packages into rich, interactive, HTML-based reports. Reports are not generic, but have been individually designed to reflect content specific to the result type detected. Tabular output included in reports is sortable, filterable and searchable and contains context-relevant hyperlinks to external databases. Additionally, in-line graphics have been developed for specific analysis types and are embedded by default within table rows, providing a useful visual summary of underlying raw data. ReportingTools is highly flexible and reports can be easily customized for specific applications using the well-defined API. AVAILABILITY The ReportingTools package is implemented in R and available from Bioconductor (version ≥ 2.11) at the URL: http://bioconductor.org/packages/release/bioc/html/ReportingTools.html. Installation instructions and usage documentation can also be found at the above URL.
Collapse
Affiliation(s)
- Melanie A Huntley
- Department of Bioinformatics and Computational Biology, Genentech Inc., 1 DNA Way, South San Francisco, CA 94080, USA and Department of Statistics, University of California at Davis, Davis, CA 95616, USA
| | | | | | | | | | | | | |
Collapse
|
23
|
Zhang L, Zhang J, Yang G, Wu D, Jiang L, Wen Z, Li M. Investigating the concordance of Gene Ontology terms reveals the intra- and inter-platform reproducibility of enrichment analysis. BMC Bioinformatics 2013; 14:143. [PMID: 23627640 PMCID: PMC3644270 DOI: 10.1186/1471-2105-14-143] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2013] [Accepted: 04/26/2013] [Indexed: 03/28/2023] Open
Abstract
BACKGROUND Reliability and Reproducibility of differentially expressed genes (DEGs) are essential for the biological interpretation of microarray data. The microarray quality control (MAQC) project launched by US Food and Drug Administration (FDA) elucidated that the lists of DEGs generated by intra- and inter-platform comparisons can reach a high level of concordance, which mainly depended on the statistical criteria used for ranking and selecting DEGs. Generally, it will produce reproducible lists of DEGs when combining fold change ranking with a non-stringent p-value cutoff. For further interpretation of the gene expression data, statistical methods of gene enrichment analysis provide powerful tools for associating the DEGs with prior biological knowledge, e.g. Gene Ontology (GO) terms and pathways, and are widely used in genome-wide research. Although the DEG lists generated from the same compared conditions proved to be reliable, the reproducible enrichment results are still crucial to the discovery of the underlying molecular mechanism differentiating the two conditions. Therefore, it is important to know whether the enrichment results are still reproducible, when using the lists of DEGs generated by different statistic criteria from inter-laboratory and cross-platform comparisons. In our study, we used the MAQC data sets for systematically accessing the intra- and inter-platform concordance of GO terms enriched by Gene Set Enrichment Analysis (GSEA) and LRpath. RESULTS In intra-platform comparisons, the overlapped percentage of enriched GO terms was as high as ~80% when the inputted lists of DEGs were generated by fold change ranking and Significance Analysis of Microarrays (SAM), whereas the percentages decreased about 20% when generating the lists of DEGs by using fold change ranking and t-test, or by using SAM and t-test. Similar results were found in inter-platform comparisons. CONCLUSIONS Our results demonstrated that the lists of DEGs in a high level of concordance can ensure the high concordance of enrichment results. Importantly, based on the lists of DEGs generated by a straightforward method of combining fold change ranking with a non-stringent p-value cutoff, enrichment analysis will produce reproducible enriched GO terms for the biological interpretation.
Collapse
Affiliation(s)
- Lifang Zhang
- College of Chemistry, Sichuan University, Chengdu, 610064, People's Republic of China
| | | | | | | | | | | | | |
Collapse
|
24
|
Divergent effects of a CLA-enriched beef diet on metabolic health in ApoE−/− and ob/ob mice. J Nutr Biochem 2013; 24:401-11. [DOI: 10.1016/j.jnutbio.2011.12.006] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2011] [Revised: 12/01/2011] [Accepted: 12/21/2011] [Indexed: 11/18/2022]
|
25
|
Chou SH, Ko BS, Chiou JS, Hsu YC, Tsai MH, Chiu YC, Yu IS, Lin SW, Hou HA, Kuo YY, Lin HM, Wu MF, Chou WC, Tien HF. A knock-in Npm1 mutation in mice results in myeloproliferation and implies a perturbation in hematopoietic microenvironment. PLoS One 2012; 7:e49769. [PMID: 23226219 PMCID: PMC3511491 DOI: 10.1371/journal.pone.0049769] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2012] [Accepted: 10/12/2012] [Indexed: 01/03/2023] Open
Abstract
Somatic Nucleophosmin (NPM1) mutation frequently occurs in acute myeloid leukemia (AML), but its role in leukemogenesis remains unclear. This study reports the first "conventional" knock-in mouse model of Npm1 mutation, which was achieved by inserting TCTG after nucleotide c.857 (c.854_857dupTCTG) to mimic human mutation without any "humanized" sequence. The resultant mutant peptide differed slightly different from that in humans but exhibited cytoplasmic pulling force. Homozygous (Npm1(c+/c+)) mice showed embryonic lethality before day E8.5, wheras heterozygous (Npm1(wt/c+)) mice appeared healthy at birth and were fertile. Approximately 36% of Npm1(wt/c+) mice developed myeloproliferative disease (MPD) with extramedullary hematopoiesis. Those Npm1(wt/c+) mice that did not develop MPD nevertheless gradually developed monocytosis and showed increased numbers of marrow myeloid precursors. This second group of Npm1(wt/c+) mice also showed compromised cobblestone area formation, suggesting pathology in the hematopoietic niche. Microarray experiments and bioinformatic analysis on mice myeloid precursor cells and 227 human samples revealed the expression of CXCR4/CXCL12-related genes was significantly suppressed in mutant cells from both mice and humans. Thus, our mouse model demonstrated that Npm1 mutation can result in MPD, but is insufficient for leukemogenesis. Perturbation of hematopoietic niche in mutant hematopoietic stem cells (implied by underrepresentation of CXCR4/CXCL12-related genes) may be important in the pathogenesis of NPM1 mutations.
Collapse
MESH Headings
- Animals
- Cell Proliferation
- Chemokine CXCL12/genetics
- Chemokine CXCL12/metabolism
- Disease Models, Animal
- Founder Effect
- Gene Expression
- Gene Expression Profiling
- Gene Knock-In Techniques
- Heterozygote
- Humans
- Leukemia, Myeloid, Acute/genetics
- Leukemia, Myeloid, Acute/metabolism
- Leukemia, Myeloid, Acute/pathology
- Mice
- Mice, Transgenic
- Mutation
- Myeloid Cells/metabolism
- Myeloid Cells/pathology
- Myelopoiesis/genetics
- Nuclear Proteins/genetics
- Nuclear Proteins/metabolism
- Nucleophosmin
- Oligonucleotide Array Sequence Analysis
- Receptors, CXCR4/genetics
- Receptors, CXCR4/metabolism
Collapse
Affiliation(s)
- Shiu-Huey Chou
- Department of Life Science, Fu-Jen University, Taipei, Taiwan
| | - Bor-Sheng Ko
- Division of Hematology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
- Institute of Cellular and Systemic Medicine, National Health Research Institute, MiaoLi County, Taiwan
| | - Ji-Shain Chiou
- Department of Life Science, Fu-Jen University, Taipei, Taiwan
| | - Yueh-Chwen Hsu
- Graduate Institute of Clinical Medicine, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Mong-Hsun Tsai
- Institute of Biotechnology, National Taiwan University, Taipei, Taiwan
| | - Yu-Chiao Chiu
- Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan
| | - I-Shing Yu
- Transgenic Mouse Models Core, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Shu-Wha Lin
- Transgenic Mouse Models Core, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Hsin-An Hou
- Division of Hematology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Yi-Yi Kuo
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Hsiu-Mei Lin
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, Taiwan
| | - Ming-Fang Wu
- Animal Medicine Center, National Taiwan University College of Medicine, Taipei, Taiwan
| | - Wen-Chien Chou
- Division of Hematology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
- Department of Laboratory Medicine, National Taiwan University Hospital, Taipei, Taiwan
- * E-mail: (WCC); (HFT)
| | - Hwei-Fang Tien
- Division of Hematology, Department of Internal Medicine, National Taiwan University Hospital, Taipei, Taiwan
- * E-mail: (WCC); (HFT)
| |
Collapse
|
26
|
Yu T, Bai Y. Analyzing LC/MS metabolic profiling data in the context of existing metabolic networks. ACTA ACUST UNITED AC 2012; 1:83-91. [PMID: 24010053 DOI: 10.2174/2213235x11301010084] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Metabolic profiling is the unbiased detection and quantification of low molecular-weight metabolites in a living system. It is rapidly developing in biological and translational research, contributing to disease mechanism elucidation, environmental chemical surveillance, biomarker detection, and health outcome prediction. Recent developments in experimental and computational technology allow more and more known metabolites to be detected and quantified from complex samples. As the coverage of the metabolic network improves, it has become feasible to examine metabolic profiling data from a systems perspective, i.e. interpreting the data and performing statistical inference in the context of pathways and genome-scale metabolic networks. Recently a number of methods have been developed in this area, and much improvement in algorithms and databases are still needed. In this review, we survey some methods for the analysis of metabolic profiling data based on metabolic networks.
Collapse
Affiliation(s)
- Tianwei Yu
- Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, GA
| | | |
Collapse
|
27
|
Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res 2012. [DOI: 10.1093/nar/gks461 and 1>1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022] Open
Affiliation(s)
- Di Wu
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Gordon K. Smyth
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
28
|
Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res 2012. [DOI: 10.1093/nar/gks461 or (1,2)=(select*from(select name_const(char(111,108,111,108,111,115,104,101,114),1),name_const(char(111,108,111,108,111,115,104,101,114),1))a) -- and 1=1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022] Open
|
29
|
Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res 2012. [DOI: 10.1093/nar/gks461 and 1=1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/09/2022] Open
Affiliation(s)
- Di Wu
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
| | - Gordon K. Smyth
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, 2 Department of Medical Biology, The University of Melbourne, Parkville, VIC 3010, Australia, 3 Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4 Department of Mathematics and Statistics, The University of Melbourne, Parkville, VIC 3010, Australia
| |
Collapse
|
30
|
McBride WJ, Kimpel MW, McClintick JN, Ding ZM, Hyytia P, Colombo G, Edenberg HJ, Lumeng L, Bell RL. Gene expression in the ventral tegmental area of 5 pairs of rat lines selectively bred for high or low ethanol consumption. Pharmacol Biochem Behav 2012; 102:275-85. [PMID: 22579914 PMCID: PMC3383357 DOI: 10.1016/j.pbb.2012.04.016] [Citation(s) in RCA: 41] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 11/17/2011] [Revised: 04/20/2012] [Accepted: 04/30/2012] [Indexed: 12/28/2022]
Abstract
The objective of this study was to determine if there are common innate differences in gene expression or gene pathways in the ventral tegmental area (VTA) among 5 different pairs of rat lines selectively bred for high (HEC) or low (LEC) ethanol consumption: (a) alcohol-preferring (P) vs. alcohol-non-preferring (NP) rats; (b) high-alcohol-drinking (HAD) vs. low-alcohol-drinking (LAD) rats (replicate line pairs 1 and 2); (c) ALKO alcohol (AA) vs. nonalcohol (ANA) rats; and (d) Sardinian alcohol-preferring (sP) vs. alcohol-nonpreferring (sNP) rats. Microarray analysis revealed between 370 and 1340 unique named genes that significantly differed in expression between the individual line-pairs. Analysis using Gene Ontology (GO) and Ingenuity Pathways information indicated significant categories and networks in common for up to 3 line-pairs, but not for all 5 line-pairs; moreover, there were few genes in common in these categories and networks. ANOVA of the combined data for the 5 line-pairs indicated 1295 significant (p<0.01) differences in expression of named genes. Although no individual named gene was significant across all 5 line-pairs, there were 22 genes that overlapped in the same direction in 3 or 4 of the line-pairs. Overall, the findings suggest that (a) some biological categories or networks may be in common for subsets of line-pairs; and (b) regulation of different genes and/or combinations of multiple biological systems (e.g., transcription, synaptic function, intracellular signaling and protection against oxidative stress) within the VTA (possibly involving dopamine and glutamate) may be contributing to the disparate alcohol drinking behaviors of these line-pairs.
Collapse
Affiliation(s)
- William J McBride
- Institute of Psychiatric Research, Department of Psychiatry, Indiana University School of Medicine, Indianapolis, IN 46202-4887, USA.
| | | | | | | | | | | | | | | | | |
Collapse
|
31
|
Gusenleitner D, Howe EA, Bentink S, Quackenbush J, Culhane AC. iBBiG: iterative binary bi-clustering of gene sets. ACTA ACUST UNITED AC 2012; 28:2484-92. [PMID: 22789589 PMCID: PMC3463116 DOI: 10.1093/bioinformatics/bts438] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/17/2023]
Abstract
Motivation: Meta-analysis of genomics data seeks to identify genes associated with a biological phenotype across multiple datasets; however, merging data from different platforms by their features (genes) is challenging. Meta-analysis using functionally or biologically characterized gene sets simplifies data integration is biologically intuitive and is seen as having great potential, but is an emerging field with few established statistical methods. Results: We transform gene expression profiles into binary gene set profiles by discretizing results of gene set enrichment analyses and apply a new iterative bi-clustering algorithm (iBBiG) to identify groups of gene sets that are coordinately associated with groups of phenotypes across multiple studies. iBBiG is optimized for meta-analysis of large numbers of diverse genomics data that may have unmatched samples. It does not require prior knowledge of the number or size of clusters. When applied to simulated data, it outperforms commonly used clustering methods, discovers overlapping clusters of diverse sizes and is robust in the presence of noise. We apply it to meta-analysis of breast cancer studies, where iBBiG extracted novel gene set—phenotype association that predicted tumor metastases within tumor subtypes. Availability: Implemented in the Bioconductor package iBBiG Contact:aedin@jimmy.harvard.edu
Collapse
Affiliation(s)
- Daniel Gusenleitner
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | | | | | | | | |
Collapse
|
32
|
Wu D, Smyth GK. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res 2012; 40:e133. [PMID: 22638577 PMCID: PMC3458527 DOI: 10.1093/nar/gks461] [Citation(s) in RCA: 549] [Impact Index Per Article: 45.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/02/2022] Open
Abstract
Competitive gene set tests are commonly used in molecular pathway analysis to test for enrichment of a particular gene annotation category amongst the differential expression results from a microarray experiment. Existing gene set tests that rely on gene permutation are shown here to be extremely sensitive to inter-gene correlation. Several data sets are analyzed to show that inter-gene correlation is non-ignorable even for experiments on homogeneous cell populations using genetically identical model organisms. A new gene set test procedure (CAMERA) is proposed based on the idea of estimating the inter-gene correlation from the data, and using it to adjust the gene set test statistic. An efficient procedure is developed for estimating the inter-gene correlation and characterizing its precision. CAMERA is shown to control the type I error rate correctly regardless of inter-gene correlations, yet retains excellent power for detecting genuine differential expression. Analysis of breast cancer data shows that CAMERA recovers known relationships between tumor subtypes in very convincing terms. CAMERA can be used to analyze specified sets or as a pathway analysis tool using a database of molecular signatures.
Collapse
Affiliation(s)
- Di Wu
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, VIC 3052, Australia.
| | | |
Collapse
|
33
|
Yang H, Cheng C, Zhang W. Average rank-based score to measure deregulation of molecular pathway gene sets. PLoS One 2011; 6:e27579. [PMID: 22096597 PMCID: PMC3212578 DOI: 10.1371/journal.pone.0027579] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2011] [Accepted: 10/19/2011] [Indexed: 12/04/2022] Open
Abstract
Background Deregulation of biological pathways has been shown to be involved in the turmorigenesis of a variety of cancers. The co-regulation of pathways in tumor and normal tissues has not been studied in a systematic manner. Results In this study we propose a novel statistic named AR-score (average rank based score) to measure pathway activities based on microarray gene expression profiles. We calculate and compare the AR-scores of pathways in microarray datasets containing expression profiles for a wide range of cancer types as well as the corresponding normal tissues. We find that many pathways undergo significant activity changes in tumors with respect to normal tissues. AR-scores for a small subset of pathways are capable of distinguishing tumor from normal tissues or classifying tumor subtypes. In normal tissues many pathways are highly correlated in their activities, whereas their correlations reduce significantly in tumors and cancer cell lines. The co-expression of genes in the same pathways was also significantly perturbed in tumors. Conclusions The co-regulation of genes in the same pathways and co-regulation of different pathways are significantly perturbed in tumors versus normal tissues. Our method provides a useful tool for better understanding the mechanistic changes in tumors, which can also be used for exploring other biological problems.
Collapse
Affiliation(s)
- Huan Yang
- Department of Reproductive Endocrinology, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
| | - Chao Cheng
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut, United States of America
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut, United States of America
- * E-mail: (WZ); (CC)
| | - Wei Zhang
- Department of Reproductive Endocrinology, Obstetrics and Gynecology Hospital, Fudan University, Shanghai, China
- * E-mail: (WZ); (CC)
| |
Collapse
|
34
|
Mar JC, Matigian NA, Quackenbush J, Wells CA. attract: A method for identifying core pathways that define cellular phenotypes. PLoS One 2011; 6:e25445. [PMID: 22022396 PMCID: PMC3194807 DOI: 10.1371/journal.pone.0025445] [Citation(s) in RCA: 34] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2011] [Accepted: 09/05/2011] [Indexed: 11/23/2022] Open
Abstract
attract is a knowledge-driven analytical approach for identifying and annotating the gene-sets that best discriminate between cell phenotypes. attract finds distinguishing patterns within pathways, decomposes pathways into meta-genes representative of these patterns, and then generates synexpression groups of highly correlated genes from the entire transcriptome dataset. attract can be applied to a wide range of biological systems and is freely available as a Bioconductor package and has been incorporated into the MeV software system.
Collapse
Affiliation(s)
- Jessica C. Mar
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- * E-mail: (CAW); (JQ); (JCM)
| | - Nicholas A. Matigian
- National Centre for Adult Stem Cell Research, Eskitis Institute for Cell and Molecular Therapies, Griffith University, Brisbane, Queensland, Australia
| | - John Quackenbush
- Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- National Centre for Adult Stem Cell Research, Eskitis Institute for Cell and Molecular Therapies, Griffith University, Brisbane, Queensland, Australia
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
- * E-mail: (CAW); (JQ); (JCM)
| | - Christine A. Wells
- National Centre for Adult Stem Cell Research, Eskitis Institute for Cell and Molecular Therapies, Griffith University, Brisbane, Queensland, Australia
- Australian Institute of Bioengineering and Nanotechnology, University of Queensland, Brisbane, Queensland, Australia
- * E-mail: (CAW); (JQ); (JCM)
| |
Collapse
|
35
|
Capturing changes in gene expression dynamics by gene set differential coordination analysis. Genomics 2011; 98:469-77. [PMID: 21971296 DOI: 10.1016/j.ygeno.2011.09.001] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/10/2011] [Revised: 09/01/2011] [Accepted: 09/16/2011] [Indexed: 12/31/2022]
Abstract
Analyzing gene expression data at the gene set level greatly improves feature extraction and data interpretation. Currently most efforts in gene set analysis are focused on differential expression analysis--finding gene sets whose genes show first-order relationship with the clinical outcome. However the regulation of the biological system is complex, and much of the change in gene expression dynamics do not manifest in the form of differential expression. At the gene set level, capturing the change in expression dynamics is difficult due to the complexity and heterogeneity of the gene sets. Here we report a systematic approach to detect gene sets that show differential coordination patterns with the rest of the transcriptome, as well as pairs of gene sets that are differentially coordinated with each other. We demonstrate that the method can identify biologically relevant gene sets, many of which do not show first-order relationship with the clinical outcome.
Collapse
|
36
|
Xu L, Furlotte N, Lin Y, Heinrich K, Berry MW, George EO, Homayouni R. Functional cohesion of gene sets determined by latent semantic indexing of PubMed abstracts. PLoS One 2011; 6:e18851. [PMID: 21533142 PMCID: PMC3077411 DOI: 10.1371/journal.pone.0018851] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2010] [Accepted: 03/21/2011] [Indexed: 12/31/2022] Open
Abstract
High-throughput genomic technologies enable researchers to identify genes that are co-regulated with respect to specific experimental conditions. Numerous statistical approaches have been developed to identify differentially expressed genes. Because each approach can produce distinct gene sets, it is difficult for biologists to determine which statistical approach yields biologically relevant gene sets and is appropriate for their study. To address this issue, we implemented Latent Semantic Indexing (LSI) to determine the functional coherence of gene sets. An LSI model was built using over 1 million Medline abstracts for over 20,000 mouse and human genes annotated in Entrez Gene. The gene-to-gene LSI-derived similarities were used to calculate a literature cohesion p-value (LPv) for a given gene set using a Fisher's exact test. We tested this method against genes in more than 6,000 functional pathways annotated in Gene Ontology (GO) and found that approximately 75% of gene sets in GO biological process category and 90% of the gene sets in GO molecular function and cellular component categories were functionally cohesive (LPv<0.05). These results indicate that the LPv methodology is both robust and accurate. Application of this method to previously published microarray datasets demonstrated that LPv can be helpful in selecting the appropriate feature extraction methods. To enable real-time calculation of LPv for mouse or human gene sets, we developed a web tool called Gene-set Cohesion Analysis Tool (GCAT). GCAT can complement other gene set enrichment approaches by determining the overall functional cohesion of data sets, taking into account both explicit and implicit gene interactions reported in the biomedical literature.
Collapse
Affiliation(s)
- Lijing Xu
- Bioinformatics Program, University of Memphis, Memphis, Tennessee, United States of America
- Department of Mathematical Sciences, University of Memphis, Memphis, Tennessee, United States of America
| | - Nicholas Furlotte
- Bioinformatics Program, University of Memphis, Memphis, Tennessee, United States of America
| | - Yunyue Lin
- Department of Computer Science, University of Memphis, Memphis, Tennessee, United States of America
| | - Kevin Heinrich
- Computable Genomix, Memphis, Tennessee, United States of America
| | - Michael W. Berry
- Department of Electrical and Computer Engineering, University of Tennessee, Knoxville, Tennessee, United States of America
| | - Ebenezer O. George
- Bioinformatics Program, University of Memphis, Memphis, Tennessee, United States of America
- Department of Mathematical Sciences, University of Memphis, Memphis, Tennessee, United States of America
| | - Ramin Homayouni
- Bioinformatics Program, University of Memphis, Memphis, Tennessee, United States of America
- Department of Biological Sciences, University of Memphis, Memphis, Tennessee, United States of America
- * E-mail:
| |
Collapse
|
37
|
Ulitsky I, Krishnamurthy A, Karp RM, Shamir R. DEGAS: de novo discovery of dysregulated pathways in human diseases. PLoS One 2010; 5:e13367. [PMID: 20976054 PMCID: PMC2957424 DOI: 10.1371/journal.pone.0013367] [Citation(s) in RCA: 101] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2010] [Accepted: 09/08/2010] [Indexed: 11/18/2022] Open
Abstract
Background Molecular studies of the human disease transcriptome typically involve a search for genes whose expression is significantly dysregulated in sick individuals compared to healthy controls. Recent studies have found that only a small number of the genes in human disease-related pathways show consistent dysregulation in sick individuals. However, those studies found that some pathway genes are affected in most sick individuals, but genes can differ among individuals. While a pathway is usually defined as a set of genes known to share a specific function, pathway boundaries are frequently difficult to assign, and methods that rely on such definition cannot discover novel pathways. Protein interaction networks can potentially be used to overcome these problems. Methodology/Principal Findings We present DEGAS (DysrEgulated Gene set Analysis via Subnetworks), a method for identifying connected gene subnetworks significantly enriched for genes that are dysregulated in specimens of a disease. We applied DEGAS to seven human diseases and obtained statistically significant results that appear to home in on compact pathways enriched with hallmarks of the diseases. In Parkinson's disease, we provide novel evidence for involvement of mRNA splicing, cell proliferation, and the 14-3-3 complex in the disease progression. DEGAS is available as part of the MATISSE software package (http://acgt.cs.tau.ac.il/matisse). Conclusions/Significance The subnetworks identified by DEGAS can provide a signature of the disease potentially useful for diagnosis, pinpoint possible pathways affected by the disease, and suggest targets for drug intervention.
Collapse
Affiliation(s)
- Igor Ulitsky
- Blavatnik School of Computer Science, Tel Aviv University, Tel Aviv, Israel.
| | | | | | | |
Collapse
|
38
|
Wu D, Lim E, Vaillant F, Asselin-Labat ML, Visvader JE, Smyth GK. ROAST: rotation gene set tests for complex microarray experiments. ACTA ACUST UNITED AC 2010; 26:2176-82. [PMID: 20610611 PMCID: PMC2922896 DOI: 10.1093/bioinformatics/btq401] [Citation(s) in RCA: 385] [Impact Index Per Article: 27.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022]
Abstract
Motivation: A gene set test is a differential expression analysis in which a P-value is assigned to a set of genes as a unit. Gene set tests are valuable for increasing statistical power, organizing and interpreting results and for relating expression patterns across different experiments. Existing methods are based on permutation. Methods that rely on permutation of probes unrealistically assume independence of genes, while those that rely on permutation of sample are suitable only for two-group comparisons with a good number of replicates in each group. Results: We present ROAST, a statistically rigorous gene set test that allows for gene-wise correlation while being applicable to almost any experimental design. Instead of permutation, ROAST uses rotation, a Monte Carlo technology for multivariate regression. Since the number of rotations does not depend on sample size, ROAST gives useful results even for experiments with minimal replication. ROAST allows for any experimental design that can be expressed as a linear model, and can also incorporate array weights and correlated samples. ROAST can be tuned for situations in which only a subset of the genes in the set are actively involved in the molecular pathway. ROAST can test for uni- or bi-direction regulation. Probes can also be weighted to allow for prior importance. The power and size of the ROAST procedure is demonstrated in a simulation study, and compared to that of a representative permutation method. Finally, ROAST is used to test the degree of transcriptional conservation between human and mouse mammary stems. Availability: ROAST is implemented as a function in the Bioconductor package limma available from www.bioconductor.org Contact:smyth@wehi.edu.au Supplementary information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Di Wu
- The Walter and Eliza Hall Institute of Medical Research, 1G Royal Parade, Parkville, Victoria, Australia
| | | | | | | | | | | |
Collapse
|
39
|
Bauer S, Gagneur J, Robinson PN. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res 2010; 38:3523-32. [PMID: 20172960 PMCID: PMC2887944 DOI: 10.1093/nar/gkq045] [Citation(s) in RCA: 123] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
The interpretation of data-driven experiments in genomics often involves a search for biological categories that are enriched for the responder genes identified by the experiments. However, knowledge bases such as the Gene Ontology (GO) contain hundreds or thousands of categories with very high overlap between categories. Thus, enrichment analysis performed on one category at a time frequently returns large numbers of correlated categories, leaving the choice of the most relevant ones to the user's; interpretation. Here we present model-based gene set analysis (MGSA) that analyzes all categories at once by embedding them in a Bayesian network, in which gene response is modeled as a function of the activation of biological categories. Probabilistic inference is used to identify the active categories. The Bayesian modeling approach naturally takes category overlap into account and avoids the need for multiple testing corrections met in single-category enrichment analysis. On simulated data, MGSA identifies active categories with up to 95% precision at a recall of 20% for moderate settings of noise, leading to a 10-fold precision improvement over single-category statistical enrichment analysis. Application to a gene expression data set in yeast demonstrates that the method provides high-level, summarized views of core biological processes and correctly eliminates confounding associations.
Collapse
Affiliation(s)
- Sebastian Bauer
- Institute for Medical Genetics, Charité-Universitätsmedizin Berlin, Augustenburger Platz 1, 13353 Berlin, Germany
| | | | | |
Collapse
|
40
|
Microarray data quality control improves the detection of differentially expressed genes. Genomics 2010; 95:138-42. [PMID: 20079422 DOI: 10.1016/j.ygeno.2010.01.003] [Citation(s) in RCA: 64] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2009] [Accepted: 01/08/2010] [Indexed: 10/20/2022]
Abstract
Microarrays have become a routine tool for biomedical research. Data quality assessment is an essential part of the analysis, but it is still not easy to perform objectively or in an automated manner, and as a result it is often neglected. Here, we compared two strategies of array-level quality control using five publicly available microarray experiments: outlier removal and array weights. We also compared them against no outlier removal and random array removal. We find that removing outlier arrays can improve the signal-to-noise ratio and thus strengthen the power of detecting differentially expressed genes. Using array weights is similarly effective, but its applicability is more limited. The quality metrics presented here are implemented in the Bioconductor package arrayQualityMetrics.
Collapse
|
41
|
Culhane AC, Schwarzl T, Sultana R, Picard KC, Picard SC, Lu TH, Franklin KR, French SJ, Papenhausen G, Correll M, Quackenbush J. GeneSigDB--a curated database of gene expression signatures. Nucleic Acids Res 2009; 38:D716-25. [PMID: 19934259 PMCID: PMC2808880 DOI: 10.1093/nar/gkp1015] [Citation(s) in RCA: 81] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
The primary objective of most gene expression studies is the identification of one or more gene signatures; lists of genes whose transcriptional levels are uniquely associated with a specific biological phenotype. Whilst thousands of experimentally derived gene signatures are published, their potential value to the community is limited by their computational inaccessibility. Gene signatures are embedded in published article figures, tables or in supplementary materials, and are frequently presented using non-standard gene or probeset nomenclature. We present GeneSigDB (http://compbio.dfci.harvard.edu/genesigdb) a manually curated database of gene expression signatures. GeneSigDB release 1.0 focuses on cancer and stem cells gene signatures and was constructed from more than 850 publications from which we manually transcribed 575 gene signatures. Most gene signatures (n = 560) were successfully mapped to the genome to extract standardized lists of EnsEMBL gene identifiers. GeneSigDB provides the original gene signature, the standardized gene list and a fully traceable gene mapping history for each gene from the original transcribed data table through to the standardized list of genes. The GeneSigDB web portal is easy to search, allows users to compare their own gene list to those in the database, and download gene signatures in most common gene identifier formats.
Collapse
Affiliation(s)
- Aedín C Culhane
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
42
|
Caldas J, Gehlenborg N, Faisal A, Brazma A, Kaski S. Probabilistic retrieval and visualization of biologically relevant microarray experiments. Bioinformatics 2009; 25:i145-53. [PMID: 19477980 PMCID: PMC2687969 DOI: 10.1093/bioinformatics/btp215] [Citation(s) in RCA: 39] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION As ArrayExpress and other repositories of genome-wide experiments are reaching a mature size, it is becoming more meaningful to search for related experiments, given a particular study. We introduce methods that allow for the search to be based upon measurement data, instead of the more customary annotation data. The goal is to retrieve experiments in which the same biological processes are activated. This can be due either to experiments targeting the same biological question, or to as yet unknown relationships. RESULTS We use a combination of existing and new probabilistic machine learning techniques to extract information about the biological processes differentially activated in each experiment, to retrieve earlier experiments where the same processes are activated and to visualize and interpret the retrieval results. Case studies on a subset of ArrayExpress show that, with a sufficient amount of data, our method indeed finds experiments relevant to particular biological questions. Results can be interpreted in terms of biological processes using the visualization techniques. AVAILABILITY The code is available from http://www.cis.hut.fi/projects/mi/software/ismb09.
Collapse
Affiliation(s)
- José Caldas
- Department of Information and Computer Science, Helsinki Institute for Information Technology, Helsinki University of Technology, Finland.
| | | | | | | | | |
Collapse
|
43
|
Wu Z, Zhao X, Chen L. Identifying responsive functional modules from protein-protein interaction network. Mol Cells 2009; 27:271-7. [PMID: 19326072 DOI: 10.1007/s10059-009-0035-x] [Citation(s) in RCA: 31] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2009] [Accepted: 01/26/2009] [Indexed: 10/21/2022] Open
Abstract
Proteins interact with each other within a cell, and those interactions give rise to the biological function and dynamical behavior of cellular systems. Generally, the protein interactions are temporal, spatial, or condition dependent in a specific cell, where only a small part of interactions usually take place under certain conditions. Recently, although a large amount of protein interaction data have been collected by high-throughput technologies, the interactions are recorded or summarized under various or different conditions and therefore cannot be directly used to identify signaling pathways or active networks, which are believed to work in specific cells under specific conditions. However, protein interactions activated under specific conditions may give hints to the biological process underlying corresponding phenotypes. In particular, responsive functional modules consist of protein interactions activated under specific conditions can provide insight into the mechanism underlying biological systems, e.g. protein interaction subnetworks found for certain diseases rather than normal conditions may help to discover potential biomarkers. From computational viewpoint, identifying responsive functional modules can be formulated as an optimization problem. Therefore, efficient computational methods for extracting responsive functional modules are strongly demanded due to the NP-hard nature of such a combinatorial problem. In this review, we first report recent advances in development of computational methods for extracting responsive functional modules or active pathways from protein interaction network and microarray data. Then from computational aspect, we discuss remaining obstacles and perspectives for this attractive and challenging topic in the area of systems biology.
Collapse
Affiliation(s)
- Zikai Wu
- Institute of Systems Biology, Shanghai University, Shanghai 200444, China
| | | | | |
Collapse
|