1
|
Chu SH, Huang YT. Integrated genomic analysis of biological gene sets with applications in lung cancer prognosis. BMC Bioinformatics 2017; 18:336. [PMID: 28697753 PMCID: PMC5505153 DOI: 10.1186/s12859-017-1737-2] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 06/22/2017] [Indexed: 01/22/2023] Open
Abstract
Background Burgeoning interest in integrative analyses has produced a rise in studies which incorporate data from multiple genomic platforms. Literature for conducting formal hypothesis testing on an integrative gene set level is considerably sparse. This paper is biologically motivated by our interest in the joint effects of epigenetic methylation loci and their associated mRNA gene expressions on lung cancer survival status. Results We provide an efficient screening approach across multiplatform genomic data on the level of biologically related sets of genes, and our methods are applicable to various disease models regardless whether the underlying true model is known (iTEGS) or unknown (iNOTE). Our proposed testing procedure dominated two competing methods. Using our methods, we identified a total of 28 gene sets with significant joint epigenomic and transcriptomic effects on one-year lung cancer survival. Conclusions We propose efficient variance component-based testing procedures to facilitate the joint testing of multiplatform genomic data across an entire gene set. The testing procedure for the gene set is self-contained, and can easily be extended to include more or different genetic platforms. iTEGS and iNOTE implemented in R are freely available through the inote package at https://cran.r-project.org//. Electronic supplementary material The online version of this article (doi:10.1186/s12859-017-1737-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Su Hee Chu
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA.,Channing Division of Network Medicine, Brigham and Women's Hospital Harvard Medical School, 181 Longwood Ave, Boston, MA, USA
| | - Yen-Tsung Huang
- Department of Epidemiology, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Department of Biostatistics, School of Public Health, Brown University, 121 S Main St, Providence, RI, USA. .,Institute of Statistical Science, Academia Sinica, No. 128, Section 2, Academia Rd, Taipei City, Taiwan.
| |
Collapse
|
4
|
Klein HU, Schäfer M, Porse BT, Hasemann MS, Ickstadt K, Dugas M. Integrative analysis of histone ChIP-seq and transcription data using Bayesian mixture models. ACTA ACUST UNITED AC 2014; 30:1154-1162. [PMID: 24403540 DOI: 10.1093/bioinformatics/btu003] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2013] [Accepted: 12/30/2013] [Indexed: 01/08/2023]
Abstract
MOTIVATION Histone modifications are a key epigenetic mechanism to activate or repress the transcription of genes. Datasets of matched transcription data and histone modification data obtained by ChIP-seq exist, but methods for integrative analysis of both data types are still rare. Here, we present a novel bioinformatics approach to detect genes that show different transcript abundances between two conditions putatively caused by alterations in histone modification. RESULTS We introduce a correlation measure for integrative analysis of ChIP-seq and gene transcription data measured by RNA sequencing or microarrays and demonstrate that a proper normalization of ChIP-seq data is crucial. We suggest applying Bayesian mixture models of different types of distributions to further study the distribution of the correlation measure. The implicit classification of the mixture models is used to detect genes with differences between two conditions in both gene transcription and histone modification. The method is applied to different datasets, and its superiority to a naive separate analysis of both data types is demonstrated. AVAILABILITY AND IMPLEMENTATION R/Bioconductor package epigenomix. CONTACT h.klein@uni-muenster.de Supplementary information: Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hans-Ulrich Klein
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Martin Schäfer
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Bo T Porse
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Marie S Hasemann
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Katja Ickstadt
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| | - Martin Dugas
- Institute of Medical Informatics, University of Münster, D-48149 Münster, Mathematical Institute, Heinrich Heine University, D-40225 Düsseldorf, Germany, The Finsen Laboratory, Rigshospitalet, Faculty of Health Sciences, Biotech Research and Innovation Center (BRIC), Danish Stem Cell Centre (DanStem), Faculty of Health Sciences, University of Copenhagen, DK-2200 Copenhagen, Denmark and Faculty of Statistics, TU Dortmund University, D-44221 Dortmund, Germany
| |
Collapse
|
5
|
Shin H, Liu T, Duan X, Zhang Y, Liu XS. Computational methodology for ChIP-seq analysis. QUANTITATIVE BIOLOGY 2013; 1:54-70. [PMID: 25741452 DOI: 10.1007/s40484-013-0006-2] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/15/2022]
Abstract
Chromatin immunoprecipitation coupled with massive parallel sequencing (ChIP-seq) is a powerful technology to identify the genome-wide locations of DNA binding proteins such as transcription factors or modified histones. As more and more experimental laboratories are adopting ChIP-seq to unravel the transcriptional and epigenetic regulatory mechanisms, computational analyses of ChIP-seq also become increasingly comprehensive and sophisticated. In this article, we review current computational methodology for ChIP-seq analysis, recommend useful algorithms and workflows, and introduce quality control measures at different analytical steps. We also discuss how ChIP-seq could be integrated with other types of genomic assays, such as gene expression profiling and genome-wide association studies, to provide a more comprehensive view of gene regulatory mechanisms in important physiological and pathological processes.
Collapse
Affiliation(s)
- Hyunjin Shin
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| | - Tao Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| | - Xikun Duan
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | - Yong Zhang
- Department of Bioinformatics, School of Life Science and Technology, Tongji University, Shanghai 200092, China
| | - X Shirley Liu
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute/Harvard School of Public Health, Boston, MA 02115, USA
| |
Collapse
|
7
|
Chen X, Jiang W, Wang Q, Huang T, Wang P, Li Y, Chen X, Lv Y, Li X. Systematically characterizing and prioritizing chemosensitivity related gene based on Gene Ontology and protein interaction network. BMC Med Genomics 2012; 5:43. [PMID: 23031817 PMCID: PMC3532125 DOI: 10.1186/1755-8794-5-43] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2011] [Accepted: 08/27/2012] [Indexed: 11/30/2022] Open
Abstract
Background The identification of genes that predict in vitro cellular chemosensitivity of cancer cells is of great importance. Chemosensitivity related genes (CRGs) have been widely utilized to guide clinical and cancer chemotherapy decisions. In addition, CRGs potentially share functional characteristics and network features in protein interaction networks (PPIN). Methods In this study, we proposed a method to identify CRGs based on Gene Ontology (GO) and PPIN. Firstly, we documented 150 pairs of drug-CCRG (curated chemosensitivity related gene) from 492 published papers. Secondly, we characterized CCRGs from the perspective of GO and PPIN. Thirdly, we prioritized CRGs based on CCRGs’ GO and network characteristics. Lastly, we evaluated the performance of the proposed method. Results We found that CCRG enriched GO terms were most often related to chemosensitivity and exhibited higher similarity scores compared to randomly selected genes. Moreover, CCRGs played key roles in maintaining the connectivity and controlling the information flow of PPINs. We then prioritized CRGs using CCRG enriched GO terms and CCRG network characteristics in order to obtain a database of predicted drug-CRGs that included 53 CRGs, 32 of which have been reported to affect susceptibility to drugs. Our proposed method identifies a greater number of drug-CCRGs, and drug-CCRGs are much more significantly enriched in predicted drug-CRGs, compared to a method based on the correlation of gene expression and drug activity. The mean area under ROC curve (AUC) for our method is 65.2%, whereas that for the traditional method is 55.2%. Conclusions Our method not only identifies CRGs with expression patterns strongly correlated with drug activity, but also identifies CRGs in which expression is weakly correlated with drug activity. This study provides the framework for the identification of signatures that predict in vitro cellular chemosensitivity and offers a valuable database for pharmacogenomics research.
Collapse
Affiliation(s)
- Xin Chen
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | | | | | | | | | | | | | | | | |
Collapse
|