1
|
Gafurov A, Vinar T, Medvedev P, Brejova B. Efficient Analysis of Annotation Colocalization Accounting for Genomic Contexts. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.22.568259. [PMID: 38045397 PMCID: PMC10690252 DOI: 10.1101/2023.11.22.568259] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/05/2023]
Abstract
An annotation is a set of genomic intervals sharing a particular function or property. Examples include genes or their exons, evolutionarily conserved elements, and regions with a particular epigenetic state. A common task is to compare two annotations to determine if one is enriched or depleted in the regions covered by the other. We study the problem of assigning statistical significance to such a comparison based on a null model representing two random unrelated annotations. To incorporate more background information into such analyses,we propose a new null model based on a Markov chain which differentiates among several genomic contexts. These contexts can capture various confounding factors, such as GC content or assembly gaps. We then develop a new algorithm for estimating p-values by computing the exact expectation and variance of the test statistics and then estimating the p-value using a normal approximation. Compared to the previous algorithm by Gafurov et al., the new algorithm provides three advances: (1) the running time is improved from quadratic to linear or quasi-linear, (2) the algorithm can handle two different test statistics, and (3) the algorithm can handle both simple and context-dependent Markov chain null models. We demonstrate the efficiency and accuracy of our algorithm on synthetic and real data sets, including the recent human telomere-to-telomere assembly. In particular, our algorithm computed p-values for 450 pairs of human genome annotations using 24 threads in under three hours. Moreover, the use of genomic contexts to correct for GC bias resulted in the reversal of some previously published findings.
Collapse
|
2
|
Wang Y, Wei Z, Su J, Coenen F, Meng J. RgnTX: Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity. Comput Struct Biotechnol J 2023; 21:4110-4117. [PMID: 37671241 PMCID: PMC10475473 DOI: 10.1016/j.csbj.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 08/13/2023] [Accepted: 08/23/2023] [Indexed: 09/07/2023] Open
Abstract
Colocalization analysis of genomic region sets has been widely adopted to unveil potential functional interactions between corresponding biological attributes, which often serves as the basis for further investigation. A number of methods have been developed for colocalization analysis of genomic elements. However, none of them explicitly considered the transcriptome heterogeneity and isoform ambiguity, making them less appropriate for analyzing transcriptome elements. Here, we developed RgnTX, an R/Bioconductor tool for the colocalization analysis of transcriptome elements with permutation tests. Different from existing approaches, RgnTX directly takes advantage of transcriptome annotation, and offers high flexibility in the null model to simulate realistic transcriptome-wide background, such as the complex alternative splicing patterns. Importantly, it supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. Proposed package offers a wide selection of pre-defined functions, easy to be utilized by users for visualizing permutation results, calculating shifted z-scores and conducting multiple hypothesis testing under Benjamini-Hochberg correction. Moreover, with synthetic and real datasets, we show that RgnTX novel testing modes return distinct and more significant results compared to existing genome-based methods. We believe RgnTX should make a useful tool to characterize the randomness of the transcriptome, and for conducting statistical association analysis for genomic region sets within the heterogeneous transcriptome. The package now has been accepted by Bioconductor and is freely available at: https://bioconductor.org/packages/RgnTX.
Collapse
Affiliation(s)
- Yue Wang
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Frans Coenen
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
3
|
Zhang H, Shi Z, Banigan EJ, Kim Y, Yu H, Bai XC, Finkelstein IJ. CTCF and R-loops are boundaries of cohesin-mediated DNA looping. Mol Cell 2023; 83:2856-2871.e8. [PMID: 37536339 DOI: 10.1016/j.molcel.2023.07.006] [Citation(s) in RCA: 18] [Impact Index Per Article: 18.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2022] [Revised: 05/10/2023] [Accepted: 07/06/2023] [Indexed: 08/05/2023]
Abstract
Cohesin and CCCTC-binding factor (CTCF) are key regulatory proteins of three-dimensional (3D) genome organization. Cohesin extrudes DNA loops that are anchored by CTCF in a polar orientation. Here, we present direct evidence that CTCF binding polarity controls cohesin-mediated DNA looping. Using single-molecule imaging, we demonstrate that a critical N-terminal motif of CTCF blocks cohesin translocation and DNA looping. The cryo-EM structure of the cohesin-CTCF complex reveals that this CTCF motif ahead of zinc fingers can only reach its binding site on the STAG1 cohesin subunit when the N terminus of CTCF faces cohesin. Remarkably, a C-terminally oriented CTCF accelerates DNA compaction by cohesin. DNA-bound Cas9 and Cas12a ribonucleoproteins are also polar cohesin barriers, indicating that stalling may be intrinsic to cohesin itself. Finally, we show that RNA-DNA hybrids (R-loops) block cohesin-mediated DNA compaction in vitro and are enriched with cohesin subunits in vivo, likely forming TAD boundaries.
Collapse
Affiliation(s)
- Hongshan Zhang
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA
| | - Zhubing Shi
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, Zhejiang, China; School of Life Sciences, Westlake University, Hangzhou 310024, Zhejiang, China; Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
| | - Edward J Banigan
- Department of Physics, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Yoori Kim
- Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA; Department of New Biology, Daegu Gyeongbuk Institute of Science and Technology, Daegu 42988, Republic of Korea
| | - Hongtao Yu
- Westlake Laboratory of Life Sciences and Biomedicine, Hangzhou 310024, Zhejiang, China; School of Life Sciences, Westlake University, Hangzhou 310024, Zhejiang, China; Department of Pharmacology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| | - Xiao-Chen Bai
- Department of Biophysics, Department of Cell Biology, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA.
| | - Ilya J Finkelstein
- Center for Systems and Synthetic Biology, Institute for Cellular and Molecular Biology, Department of Molecular Biosciences, University of Texas at Austin, Austin, TX 78712, USA.
| |
Collapse
|
4
|
Bürger A, Dugas M. Cogito: automated and generic comparison of annotated genomic intervals. BMC Bioinformatics 2022; 23:315. [PMID: 35927614 PMCID: PMC9351259 DOI: 10.1186/s12859-022-04853-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/23/2022] [Indexed: 11/27/2022] Open
Abstract
Background Genetic and epigenetic biological studies often combine different types of experiments and multiple conditions. While the corresponding raw and processed data are made available through specialized public databases, the processed files are usually limited to a specific research question. Hence, they are unsuitable for an unbiased, systematic overview of a complex dataset. However, possible combinations of different sample types and conditions grow exponentially with the amount of sample types and conditions. Therefore the risk to miss a correlation or to overrate an identified correlation should be mitigated in a complex dataset. Since reanalysis of a full study is rarely a viable option, new methods are needed to address these issues systematically, reliably, reproducibly and efficiently. Results Cogito “COmpare annotated Genomic Intervals TOol” provides a workflow for an unbiased, structured overview and systematic analysis of complex genomic datasets consisting of different data types (e.g. RNA-seq, ChIP-seq) and conditions. Cogito is able to visualize valuable key information of genomic or epigenomic interval-based data, thereby providing a straightforward analysis approach for comparing different conditions. It supports getting an unbiased impression of a dataset and developing an appropriate analysis strategy for it. In addition to a text-based report, Cogito offers a fully customizable report as a starting point for further in-depth investigation. Conclusions Cogito implements a novel approach to facilitate high-level overview analyses of complex datasets, and offers additional insights into the data without the need for a full, time-consuming reanalysis. The R/Bioconductor package is freely available at https://bioconductor.org/packages/release/bioc/html/Cogito.html, a comprehensive documentation with detailed descriptions and reproducible examples is included. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04853-1.
Collapse
Affiliation(s)
- Annika Bürger
- Institute of Medical Informatics, Westfälische Wilhelms-Universität Münster, Albert-Schweitzer-Campus 1, 48149, Münster, Germany.
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Seminarstr. 2, 69117, Heidelberg, Germany
| |
Collapse
|
5
|
Ferré Q, Capponi C, Puthier D. OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning. NAR Genom Bioinform 2022; 3:lqab114. [PMID: 34988437 PMCID: PMC8693575 DOI: 10.1093/nargab/lqab114] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/08/2021] [Accepted: 11/23/2021] [Indexed: 02/06/2023] Open
Abstract
Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the statistical significance of n-wise overlaps of genomic features is seldom tackled, which prevent rigorous studies of n-wise interactions. We introduce OLOGRAM-MODL, which considers overlaps between n ≥ 2 sets of genomic regions, and computes their statistical mutual enrichment by Monte Carlo fitting of a Negative Binomial distribution, resulting in more resolutive P-values. An optional machine learning method is proposed to find complexes of interest, using a new itemset mining algorithm based on dictionary learning which is resistant to noise inherent to biological assays. The overall approach is implemented through an easy-to-use CLI interface for workflow integration, and a visual tree-based representation of the results suited for explicability. The viability of the method is experimentally studied using both artificial and biological data. This approach is accessible through the command line interface of the pygtftk toolkit, available on Bioconda and from https://github.com/dputhier/pygtftk
Collapse
Affiliation(s)
- Quentin Ferré
- Aix Marseille Univ, INSERM, UMR U1090, TAGC, Marseille, France
| | - Cécile Capponi
- Aix Marseille Univ, CNRS, UMR 7020, LIS, Qarma, Marseille, France
| | - Denis Puthier
- Aix Marseille Univ, INSERM, UMR U1090, TAGC, Marseille, France
| |
Collapse
|
6
|
Gundersen S, Boddu S, Capella-Gutierrez S, Drabløs F, Fernández JM, Kompova R, Taylor K, Titov D, Zerbino D, Hovig E. Recommendations for the FAIRification of genomic track metadata. F1000Res 2021; 10. [PMID: 34249331 PMCID: PMC8226415 DOI: 10.12688/f1000research.28449.1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 03/17/2021] [Indexed: 01/25/2023] Open
Abstract
Background: Many types of data from genomic analyses can be represented as genomic tracks,
i.e. features linked to the genomic coordinates of a reference genome. Examples of such data are epigenetic DNA methylation data, ChIP-seq peaks, germline or somatic DNA variants, as well as RNA-seq expression levels. Researchers often face difficulties in locating, accessing and combining relevant tracks from external sources, as well as locating the raw data, reducing the value of the generated information. Description of work: We propose to advance the application of FAIR data principles (Findable, Accessible, Interoperable, and Reusable) to produce searchable metadata for genomic tracks. Findability and Accessibility of metadata can then be ensured by a track search service that integrates globally identifiable metadata from various track hubs in the Track Hub Registry and other relevant repositories. Interoperability and Reusability need to be ensured by the specification and implementation of a basic set of recommendations for metadata. We have tested this concept by developing such a specification in a JSON Schema, called FAIRtracks, and have integrated it into a novel track search service, called TrackFind. We demonstrate practical usage by importing datasets through TrackFind into existing examples of relevant analytical tools for genomic tracks: EPICO and the GSuite HyperBrowser. Conclusion: We here provide a first iteration of a draft standard for genomic track metadata, as well as the accompanying software ecosystem. It can easily be adapted or extended to future needs of the research community regarding data, methods and tools, balancing the requirements of both data submitters and analytical end-users.
Collapse
Affiliation(s)
| | - Sanjay Boddu
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | | | - Finn Drabløs
- Department of Clinical and Molecular Medicine, NTNU - Norwegian University of Science and Technology, Trondheim, Norway
| | - José M Fernández
- Life Sciences Department, Barcelona Supercomputing Center (BSC), Barcelona, Spain
| | - Radmila Kompova
- Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway
| | - Kieron Taylor
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Dmytro Titov
- Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway
| | - Daniel Zerbino
- European Molecular Biology Laboratory, European Bioinformatics Institute, Hinxton, UK
| | - Eivind Hovig
- Center for Bioinformatics, University of Oslo (UiO), Oslo, Norway.,Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital (OUH), Oslo, Norway
| |
Collapse
|
7
|
Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2 leading to universal blueprints for vaccine designs. Sci Rep 2020; 10:22375. [PMID: 33361777 PMCID: PMC7758335 DOI: 10.1038/s41598-020-78758-5] [Citation(s) in RCA: 48] [Impact Index Per Article: 12.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 11/30/2020] [Indexed: 02/08/2023] Open
Abstract
The global population is at present suffering from a pandemic of Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The goal of this study was to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protection across the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from the NEC Immune Profiler suite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a Monte Carlo simulation designed to identify statistically significant “epitope hotspot” regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3,400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequence conservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA haplotypes of approximately 22,000 individuals to develop a “digital twin” type simulation to model how effective different combinations of hotspots would work in a diverse human population; the approach identified an optimal constellation of epitope hotspots that could provide maximum coverage in the global population. By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profiler with a robust Monte Carlo and digital twin simulation, we have profiled the entire SARS-CoV-2 proteome and identified a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.
Collapse
|
8
|
Malone B, Simovski B, Moliné C, Cheng J, Gheorghe M, Fontenelle H, Vardaxis I, Tennøe S, Malmberg JA, Stratford R, Clancy T. Artificial intelligence predicts the immunogenic landscape of SARS-CoV-2 leading to universal blueprints for vaccine designs. Sci Rep 2020; 10:22375. [PMID: 33361777 DOI: 10.1101/2020.04.21.052084] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2020] [Accepted: 11/30/2020] [Indexed: 05/23/2023] Open
Abstract
The global population is at present suffering from a pandemic of Coronavirus disease 2019 (COVID-19), caused by the novel coronavirus Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). The goal of this study was to use artificial intelligence (AI) to predict blueprints for designing universal vaccines against SARS-CoV-2, that contain a sufficiently broad repertoire of T-cell epitopes capable of providing coverage and protection across the global population. To help achieve these aims, we profiled the entire SARS-CoV-2 proteome across the most frequent 100 HLA-A, HLA-B and HLA-DR alleles in the human population, using host-infected cell surface antigen presentation and immunogenicity predictors from the NEC Immune Profiler suite of tools, and generated comprehensive epitope maps. We then used these epitope maps as input for a Monte Carlo simulation designed to identify statistically significant "epitope hotspot" regions in the virus that are most likely to be immunogenic across a broad spectrum of HLA types. We then removed epitope hotspots that shared significant homology with proteins in the human proteome to reduce the chance of inducing off-target autoimmune responses. We also analyzed the antigen presentation and immunogenic landscape of all the nonsynonymous mutations across 3,400 different sequences of the virus, to identify a trend whereby SARS-COV-2 mutations are predicted to have reduced potential to be presented by host-infected cells, and consequently detected by the host immune system. A sequence conservation analysis then removed epitope hotspots that occurred in less-conserved regions of the viral proteome. Finally, we used a database of the HLA haplotypes of approximately 22,000 individuals to develop a "digital twin" type simulation to model how effective different combinations of hotspots would work in a diverse human population; the approach identified an optimal constellation of epitope hotspots that could provide maximum coverage in the global population. By combining the antigen presentation to the infected-host cell surface and immunogenicity predictions of the NEC Immune Profiler with a robust Monte Carlo and digital twin simulation, we have profiled the entire SARS-CoV-2 proteome and identified a subset of epitope hotspots that could be harnessed in a vaccine formulation to provide a broad coverage across the global population.
Collapse
Affiliation(s)
- Brandon Malone
- NEC Laboratories Europe GmbH, Kurfuersten-Anlage 36, 69115, Heidelberg, Germany
| | - Boris Simovski
- NEC OncoImmunity AS, Ullernchausseen 64/66, 0379, Oslo, Norway
| | - Clément Moliné
- NEC OncoImmunity AS, Ullernchausseen 64/66, 0379, Oslo, Norway
| | - Jun Cheng
- NEC Laboratories Europe GmbH, Kurfuersten-Anlage 36, 69115, Heidelberg, Germany
| | - Marius Gheorghe
- NEC OncoImmunity AS, Ullernchausseen 64/66, 0379, Oslo, Norway
| | | | | | - Simen Tennøe
- NEC OncoImmunity AS, Ullernchausseen 64/66, 0379, Oslo, Norway
| | | | | | - Trevor Clancy
- NEC OncoImmunity AS, Ullernchausseen 64/66, 0379, Oslo, Norway.
| |
Collapse
|
9
|
Fan Q, Nørgaard RC, Grytten I, Ness CM, Lucas C, Vekterud K, Soedling H, Matthews J, Lemma RB, Gabrielsen OS, Bindesbøll C, Ulven SM, Nebb HI, Grønning-Wang LM, Sæther T. LXRα Regulates ChREBPα Transactivity in a Target Gene-Specific Manner through an Agonist-Modulated LBD-LID Interaction. Cells 2020; 9:cells9051214. [PMID: 32414201 PMCID: PMC7290792 DOI: 10.3390/cells9051214] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2020] [Revised: 04/19/2020] [Accepted: 05/07/2020] [Indexed: 01/02/2023] Open
Abstract
The cholesterol-sensing nuclear receptor liver X receptor (LXR) and the glucose-sensing transcription factor carbohydrate responsive element-binding protein (ChREBP) are central players in regulating glucose and lipid metabolism in the liver. More knowledge of their mechanistic interplay is needed to understand their role in pathological conditions like fatty liver disease and insulin resistance. In the current study, LXR and ChREBP co-occupancy was examined by analyzing ChIP-seq datasets from mice livers. LXR and ChREBP interaction was determined by Co-immunoprecipitation (CoIP) and their transactivity was assessed by real-time quantitative polymerase chain reaction (qPCR) of target genes and gene reporter assays. Chromatin binding capacity was determined by ChIP-qPCR assays. Our data show that LXRα and ChREBPα interact physically and show a high co-occupancy at regulatory regions in the mouse genome. LXRα co-activates ChREBPα and regulates ChREBP-specific target genes in vitro and in vivo. This co-activation is dependent on functional recognition elements for ChREBP but not for LXR, indicating that ChREBPα recruits LXRα to chromatin in trans. The two factors interact via their key activation domains; the low glucose inhibitory domain (LID) of ChREBPα and the ligand-binding domain (LBD) of LXRα. While unliganded LXRα co-activates ChREBPα, ligand-bound LXRα surprisingly represses ChREBPα activity on ChREBP-specific target genes. Mechanistically, this is due to a destabilized LXRα:ChREBPα interaction, leading to reduced ChREBP-binding to chromatin and restricted activation of glycolytic and lipogenic target genes. This ligand-driven molecular switch highlights an unappreciated role of LXRα in responding to nutritional cues that was overlooked due to LXR lipogenesis-promoting function.
Collapse
Affiliation(s)
- Qiong Fan
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (Q.F.); (K.V.); (C.B.)
| | - Rikke Christine Nørgaard
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Ivar Grytten
- Department of Informatics, Faculty of Mathematics and Natural Sciences, University of Oslo, N-0317 Oslo, Norway;
| | - Cecilie Maria Ness
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Christin Lucas
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Kristin Vekterud
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (Q.F.); (K.V.); (C.B.)
| | - Helen Soedling
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Jason Matthews
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Roza Berhanu Lemma
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, N-0317 Oslo, Norway; (R.B.L.); (O.S.G.)
| | - Odd Stokke Gabrielsen
- Department of Biosciences, Faculty of Mathematics and Natural Sciences, University of Oslo, N-0317 Oslo, Norway; (R.B.L.); (O.S.G.)
| | - Christian Bindesbøll
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (Q.F.); (K.V.); (C.B.)
| | - Stine Marie Ulven
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Hilde Irene Nebb
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Line Mariann Grønning-Wang
- Department of Nutrition, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (R.C.N.); (C.M.N.); (C.L.); (H.S.); (J.M.); (S.M.U.); (H.I.N.); (L.M.G.-W.)
| | - Thomas Sæther
- Department of Molecular Medicine, Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, N-0317 Oslo, Norway; (Q.F.); (K.V.); (C.B.)
- Correspondence: ; Tel.: +47-22-851510
| |
Collapse
|
10
|
Cui Z, Kancherla J, Chang KW, Elmqvist N, Corrada Bravo H. Proactive visual and statistical analysis of genomic data in Epiviz. Bioinformatics 2020; 36:2195-2201. [PMID: 31782758 DOI: 10.1093/bioinformatics/btz883] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2019] [Revised: 11/04/2019] [Accepted: 11/27/2019] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Integrative analysis of genomic data that includes statistical methods in combination with visual exploration has gained widespread adoption. Many existing methods involve a combination of tools and resources: user interfaces that provide visualization of large genomic datasets, and computational environments that focus on data analyses over various subsets of a given dataset. Over the last few years, we have developed Epiviz as an integrative and interactive genomic data analysis tool that incorporates visualization tightly with state-of-the-art statistical analysis framework. RESULTS In this article, we present Epiviz Feed, a proactive and automatic visual analytics system integrated with Epiviz that alleviates the burden of manually executing data analysis required to test biologically meaningful hypotheses. Results of interest that are proactively identified by server-side computations are listed as notifications in a feed. The feed turns genomic data analysis into a collaborative work between the analyst and the computational environment, which shortens the analysis time and allows the analyst to explore results efficiently. We discuss three ways where the proposed system advances the field of genomic data analysis: (i) takes the first step of proactive data analysis by utilizing available CPU power from the server to automate the analysis process; (ii) summarizes hypothesis test results in a way that analysts can easily understand and investigate; (iii) enables filtering and grouping of analysis results for quick search. This effort provides initial work on systems that substantially expand how computational and visualization frameworks can be tightly integrated to facilitate interactive genomic data analysis. AVAILABILITY AND IMPLEMENTATION The source code for Epiviz Feed application is available at http://github.com/epiviz/epiviz_feed_polymer. The Epiviz Computational Server is available at http://github.com/epiviz/epiviz-feed-computation. Please refer to Epiviz documentation site for details: http://epiviz.github.io/.
Collapse
Affiliation(s)
- Zhe Cui
- Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA.,Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,Human-Computer Interaction Laboratory, University of Maryland, College Park, MD 20742, USA
| | - Jayaram Kancherla
- Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Kyle W Chang
- Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| | - Niklas Elmqvist
- Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA.,Human-Computer Interaction Laboratory, University of Maryland, College Park, MD 20742, USA.,Department of Computer Science, University of Maryland, College Park, MD 20742, USA.,College of Information Studies, University of Maryland, College Park, MD 20742, USA
| | - Héctor Corrada Bravo
- Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA.,Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.,Department of Computer Science, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
11
|
Bonnefont J, Tiberi L, van den Ameele J, Potier D, Gaber ZB, Lin X, Bilheu A, Herpoel A, Velez Bravo FD, Guillemot F, Aerts S, Vanderhaeghen P. Cortical Neurogenesis Requires Bcl6-Mediated Transcriptional Repression of Multiple Self-Renewal-Promoting Extrinsic Pathways. Neuron 2019; 103:1096-1108.e4. [PMID: 31353074 PMCID: PMC6859502 DOI: 10.1016/j.neuron.2019.06.027] [Citation(s) in RCA: 28] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2018] [Revised: 05/08/2019] [Accepted: 06/26/2019] [Indexed: 12/14/2022]
Abstract
During neurogenesis, progenitors switch from self-renewal to differentiation through the interplay of intrinsic and extrinsic cues, but how these are integrated remains poorly understood. Here, we combine whole-genome transcriptional and epigenetic analyses with in vivo functional studies to demonstrate that Bcl6, a transcriptional repressor previously reported to promote cortical neurogenesis, acts as a driver of the neurogenic transition through direct silencing of a selective repertoire of genes belonging to multiple extrinsic pathways promoting self-renewal, most strikingly the Wnt pathway. At the molecular level, Bcl6 represses its targets through Sirt1 recruitment followed by histone deacetylation. Our data identify a molecular logic by which a single cell-intrinsic factor represses multiple extrinsic pathways that favor self-renewal, thereby ensuring robustness of neuronal fate transition.
Collapse
Affiliation(s)
- Jerome Bonnefont
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium; VIB-KU Leuven Center for Brain & Disease Research, 3000 Leuven, Belgium
| | - Luca Tiberi
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium
| | - Jelle van den Ameele
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium
| | - Delphine Potier
- VIB-KU Leuven Center for Brain & Disease Research, 3000 Leuven, Belgium
| | | | - Xionghui Lin
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium
| | - Angéline Bilheu
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium
| | - Adèle Herpoel
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium
| | - Fausto D Velez Bravo
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium; VIB-KU Leuven Center for Brain & Disease Research, 3000 Leuven, Belgium
| | | | - Stein Aerts
- VIB-KU Leuven Center for Brain & Disease Research, 3000 Leuven, Belgium
| | - Pierre Vanderhaeghen
- Université Libre de Bruxelles (ULB), Institut de Recherches en Biologie Humaine et Moléculaire (IRIBHM), and ULB Neuroscience Institute (UNI), 1070 Brussels, Belgium; VIB-KU Leuven Center for Brain & Disease Research, 3000 Leuven, Belgium; Department of Neurosciences, Leuven Brain Institute, KU Leuven, 3000 Leuven, Belgium; Welbio, Université Libre de Bruxelles (ULB), 1070 Brussels, Belgium.
| |
Collapse
|
12
|
Kanduri C, Bock C, Gundersen S, Hovig E, Sandve GK. Colocalization analyses of genomic elements: approaches, recommendations and challenges. Bioinformatics 2019; 35:1615-1624. [PMID: 30307532 PMCID: PMC6499241 DOI: 10.1093/bioinformatics/bty835] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 09/03/2018] [Accepted: 10/10/2018] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Many high-throughput methods produce sets of genomic regions as one of their main outputs. Scientists often use genomic colocalization analysis to interpret such region sets, for example to identify interesting enrichments and to understand the interplay between the underlying biological processes. Although widely used, there is little standardization in how these analyses are performed. Different practices can substantially affect the conclusions of colocalization analyses. RESULTS Here, we describe the different approaches and provide recommendations for performing genomic colocalization analysis, while also discussing common methodological challenges that may influence the conclusions. As illustrated by concrete example cases, careful attention to analysis details is needed in order to meet these challenges and to obtain a robust and biologically meaningful interpretation of genomic region set data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway
- K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Sveinung Gundersen
- Department of Informatics, University of Oslo, Oslo, Norway
- Elixir Norway, Oslo Node, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Elixir Norway, Oslo Node, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo, Norway
- Institute for Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo, Norway, UK
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
- K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| |
Collapse
|
13
|
Domanska D, Kanduri C, Simovski B, Sandve GK. Mind the gaps: overlooking inaccessible regions confounds statistical testing in genome analysis. BMC Bioinformatics 2018; 19:481. [PMID: 30547739 PMCID: PMC6293655 DOI: 10.1186/s12859-018-2438-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 10/15/2018] [Indexed: 01/21/2023] Open
Abstract
Background The current versions of reference genome assemblies still contain gaps represented by stretches of Ns. Since high throughput sequencing reads cannot be mapped to those gap regions, the regions are depleted of experimental data. Moreover, several technology platforms assay a targeted portion of the genomic sequence, meaning that regions from the unassayed portion of the genomic sequence cannot be detected in those experiments. We here refer to all such regions as inaccessible regions, and hypothesize that ignoring these regions in the null model may increase false findings in statistical testing of colocalization of genomic features. Results Our explorative analyses confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps of human reference genomes (hg19 and hg38). The little intersection was observed only at the beginning and end portions of the gap regions. Further, we simulated a set of synthetic tracks by matching the properties of real genomic tracks in a way that nullified any true association between them. This allowed us to test our hypothesis that not avoiding inaccessible regions (as represented by assembly gaps) in the null model would result in spurious inflation of statistical significance. We contrasted the distributions of test statistics and p-values of Monte Carlo-based permutation tests that either avoided or did not avoid assembly gaps in the null model when testing colocalization between a pair of tracks. We observed that the statistical tests that did not account for assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribution of p-values that is shifted to the left (indicating inflated significance). We observed a similar level of inflated significance in hg19 and hg38, despite assembly gaps covering a smaller proportion of the latter reference genome. Conclusion We provide empirical evidence demonstrating that inaccessible regions, even when covering only a few percentages of the genome, can lead to a substantial amount of false findings if not accounted for in statistical colocalization analysis. Electronic supplementary material The online version of this article (10.1186/s12859-018-2438-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Diana Domanska
- Department of Informatics, University of Oslo, Oslo, Norway.
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway.,K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| | - Boris Simovski
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, Norway.,K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| |
Collapse
|
14
|
Transcription-associated histone pruning demarcates macroH2A chromatin domains. Nat Struct Mol Biol 2018; 25:958-970. [PMID: 30291361 PMCID: PMC6178985 DOI: 10.1038/s41594-018-0134-5] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2018] [Accepted: 08/17/2018] [Indexed: 02/01/2023]
Abstract
The histone variant macroH2A occupies large repressive domains throughout the genome, however mechanisms underlying its precise deposition remain poorly understood. Here, we characterized de novo chromatin deposition of macroH2A2 using temporal genomic profiling in murine-derived fibroblasts devoid of all macroH2A isoforms. We find that macroH2A2 is first pervasively deposited genome-wide at both steady state domains and adjacent transcribed regions, the latter of which are subsequently pruned, establishing mature macroH2A2 domains. Pruning of macroH2A2 can be counteracted by chemical inhibition of transcription. Further, CRISPR/Cas9-based locus-specific transcriptional manipulation reveals that gene activation depletes pre-existing macroH2A2, while silencing triggers ectopic macroH2A2 accumulation. We demonstrate that the FACT (facilitates chromatin transcription) complex is required for macroH2A2 pruning within transcribed chromatin. Taken together, we have identified active chromatin as a boundary for macroH2A domains through a transcription-associated ‘pruning’ mechanism that establishes and maintains the faithful genomic localization of macroH2A variants.
Collapse
|
15
|
Rivera-Mulia JC, Schwerer H, Besnard E, Desprat R, Trevilla-Garcia C, Sima J, Bensadoun P, Zouaoui A, Gilbert DM, Lemaitre JM. Cellular senescence induces replication stress with almost no affect on DNA replication timing. Cell Cycle 2018; 17:1667-1681. [PMID: 29963964 DOI: 10.1080/15384101.2018.1491235] [Citation(s) in RCA: 13] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023] Open
Abstract
Organismal aging entails a gradual decline of normal physiological functions and a major contributor to this decline is withdrawal of the cell cycle, known as senescence. Senescence can result from telomere diminution leading to a finite number of population doublings, known as replicative senescence (RS), or from oncogene overexpression, as a protective mechanism against cancer. Senescence is associated with large-scale chromatin re-organization and changes in gene expression. Replication stress is a complex phenomenon, defined as the slowing or stalling of replication fork progression and/or DNA synthesis, which has serious implications for genome stability, and consequently in human diseases. Aberrant replication fork structures activate the replication stress response leading to the activation of dormant origins, which is thought to be a safeguard mechanism to complete DNA replication on time. However, the relationship between replicative stress and the changes in the spatiotemporal program of DNA replication in senescence progression remains unclear. Here, we studied the DNA replication program during senescence progression in proliferative and pre-senescent cells from donors of various ages by single DNA fiber combing of replicated DNA, origin mapping by sequencing short nascent strands and genome-wide profiling of replication timing (TRT). We demonstrate that, progression into RS leads to reduced replication fork rates and activation of dormant origins, which are the hallmarks of replication stress. However, with the exception of a delay in RT of the CREB5 gene in all pre-senescent cells, RT was globally unaffected by replication stress during entry into either oncogene-induced or RS. Consequently, we conclude that RT alterations associated with physiological and accelerated aging, do not result from senescence progression. Our results clarify the interplay between senescence, aging and replication programs and demonstrate that RT is largely resistant to replication stress.
Collapse
Affiliation(s)
| | - Hélène Schwerer
- b Laboratory of Genome and Stem Cell Plasticity in Development and Aging , Institute of Regenerative Medicine, U1183, Université de Montpellier , Montpellier Cedex , France
| | - Emilie Besnard
- b Laboratory of Genome and Stem Cell Plasticity in Development and Aging , Institute of Regenerative Medicine, U1183, Université de Montpellier , Montpellier Cedex , France
| | - Romain Desprat
- c Stem cell Core Facility SAFE-iPS INGESTEM , CHU Montpellier, Saint Eloi Hospital , Montpellier Cedex , France
| | | | - Jiao Sima
- a Department of Biological Science , Florida State University , Tallahassee , FL , USA
| | - Paul Bensadoun
- b Laboratory of Genome and Stem Cell Plasticity in Development and Aging , Institute of Regenerative Medicine, U1183, Université de Montpellier , Montpellier Cedex , France
| | - Anissa Zouaoui
- c Stem cell Core Facility SAFE-iPS INGESTEM , CHU Montpellier, Saint Eloi Hospital , Montpellier Cedex , France
| | - David M Gilbert
- a Department of Biological Science , Florida State University , Tallahassee , FL , USA.,d Center for Genomics and Personalized Medicine , Florida State University , Tallahassee , FL , USA
| | - Jean-Marc Lemaitre
- b Laboratory of Genome and Stem Cell Plasticity in Development and Aging , Institute of Regenerative Medicine, U1183, Université de Montpellier , Montpellier Cedex , France.,c Stem cell Core Facility SAFE-iPS INGESTEM , CHU Montpellier, Saint Eloi Hospital , Montpellier Cedex , France
| |
Collapse
|
16
|
Epigenetic dysregulation of naive CD4+ T-cell activation genes in childhood food allergy. Nat Commun 2018; 9:3308. [PMID: 30120223 PMCID: PMC6098117 DOI: 10.1038/s41467-018-05608-4] [Citation(s) in RCA: 62] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 06/05/2018] [Indexed: 12/31/2022] Open
Abstract
Food allergy poses a significant clinical and public health burden affecting 2–10% of infants. Using integrated DNA methylation and transcriptomic profiling, we found that polyclonal activation of naive CD4+ T cells through the T cell receptor results in poorer lymphoproliferative responses in children with immunoglobulin E (IgE)-mediated food allergy. Reduced expression of cell cycle-related targets of the E2F and MYC transcription factor networks, and remodeling of DNA methylation at metabolic (RPTOR, PIK3D, MAPK1, FOXO1) and inflammatory genes (IL1R, IL18RAP, CD82) underpins this suboptimal response. Infants who fail to resolve food allergy in later childhood exhibit cumulative increases in epigenetic disruption at T cell activation genes and poorer lymphoproliferative responses compared to children who resolved food allergy. Our data indicate epigenetic dysregulation in the early stages of signal transduction through the T cell receptor complex, and likely reflects pathways modified by gene–environment interactions in food allergy. Immunoglobulin E (IgE)-mediated food allergy is a major issue that affects 2–10% of infants. Here the authors study the epigenetic regulation of the naive CD4+ T cell activation response among children with IgE-mediated food allergy finding epigenetic dysregulation in the early stages of signal transduction through the T cell receptor complex.
Collapse
|
17
|
Stavrovskaya ED, Niranjan T, Fertig EJ, Wheelan SJ, Favorov AV, Mironov AA. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data. Bioinformatics 2018; 33:3158-3165. [PMID: 29028265 DOI: 10.1093/bioinformatics/btx379] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 06/12/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. Contact favorov@sensi.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elena D Stavrovskaya
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| | - Tejasvi Niranjan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Sarah J Wheelan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Alexander V Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, RAS, Moscow 119333, Russia.,Laboratory of Bioinformatics, Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow 117545, Russia
| | - Andrey A Mironov
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| |
Collapse
|
18
|
Dozmorov MG. Epigenomic annotation-based interpretation of genomic data: from enrichment analysis to machine learning. Bioinformatics 2018; 33:3323-3330. [PMID: 29028263 DOI: 10.1093/bioinformatics/btx414] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2017] [Accepted: 06/22/2017] [Indexed: 12/12/2022] Open
Abstract
Motivation One of the goals of functional genomics is to understand the regulatory implications of experimentally obtained genomic regions of interest (ROIs). Most sequencing technologies now generate ROIs distributed across the whole genome. The interpretation of these genome-wide ROIs represents a challenge as the majority of them lie outside of functionally well-defined protein coding regions. Recent efforts by the members of the International Human Epigenome Consortium have generated volumes of functional/regulatory data (reference epigenomic datasets), effectively annotating the genome with epigenomic properties. Consequently, a wide variety of computational tools has been developed utilizing these epigenomic datasets for the interpretation of genomic data. Results The purpose of this review is to provide a structured overview of practical solutions for the interpretation of ROIs with the help of epigenomic data. Starting with epigenomic enrichment analysis, we discuss leading tools and machine learning methods utilizing epigenomic and 3D genome structure data. The hierarchy of tools and methods reviewed here presents a practical guide for the interpretation of genome-wide ROIs within an epigenomic context. Contact mikhail.dozmorov@vcuhealth.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Mikhail G Dozmorov
- Department of Biostatistics, Virginia Commonwealth University, Richmond, VA 23298, USA
| |
Collapse
|
19
|
Simovski B, Kanduri C, Gundersen S, Titov D, Domanska D, Bock C, Bossini-Castillo L, Chikina M, Favorov A, Layer RM, Mironov AA, Quinlan AR, Sheffield NC, Trynka G, Sandve GK. Coloc-stats: a unified web interface to perform colocalization analysis of genomic features. Nucleic Acids Res 2018; 46:W186-W193. [PMID: 29873782 PMCID: PMC6030976 DOI: 10.1093/nar/gky474] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 05/05/2018] [Accepted: 05/15/2018] [Indexed: 12/16/2022] Open
Abstract
Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.
Collapse
Affiliation(s)
- Boris Simovski
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- K. G. Jebsen Centre for Coeliac Disease Research, Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway
| | - Sveinung Gundersen
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- Elixir Norway - Oslo node, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Dmytro Titov
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- Elixir Norway - Oslo node, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Diana Domanska
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | | | - Maria Chikina
- University of Pittsburgh School of Medicine, 3550 Terrace Street, Pittsburgh, PA 15213, USA
| | - Alexander Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, 550 N Broadway, Baltimore, MD 21205, USA
- Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Gubkina Street 3, Moscow 119333, Russia
| | - Ryan M Layer
- Department of Human Genetics, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
- USTAR Center for Genetic Discovery, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
| | - Andrey A Mironov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Lab. Bldg B, Vorobiovy Gory 1-73, Moscow 119992, Russia
- Skolkovo Institute of Science and Technology, Nobelya ul. 3, Moscow 121205, Russia
- Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karenty per. 19, Moscow 127994, Russia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
- USTAR Center for Genetic Discovery, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT 84108, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903 USA
| | - Gosia Trynka
- Cellular Genetics Programme, Wellcome Sanger Institute, CB10 1SA Hinxton, UK
| | - Geir K Sandve
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- K. G. Jebsen Centre for Coeliac Disease Research, Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway
| |
Collapse
|
20
|
Tekle KM, Gundersen S, Klepper K, Bongo LA, Raknes IA, Li X, Zhang W, Andreetta C, Mulugeta TD, Kalaš M, Rye MB, Hjerde E, Antony Samy JK, Fornous G, Azab A, Våge DI, Hovig E, Willassen NP, Drabløs F, Nygård S, Petersen K, Jonassen I. Norwegian e-Infrastructure for Life Sciences (NeLS). F1000Res 2018; 7:ELIXIR-968. [PMID: 30271575 PMCID: PMC6137412 DOI: 10.12688/f1000research.15119.1] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 06/13/2018] [Indexed: 12/26/2022] Open
Abstract
The Norwegian e-Infrastructure for Life Sciences (NeLS) has been developed by ELIXIR Norway to provide its users with a system enabling data storage, sharing, and analysis in a project-oriented fashion. The system is available through easy-to-use web interfaces, including the Galaxy workbench for data analysis and workflow execution. Users confident with a command-line interface and programming may also access it through Secure Shell (SSH) and application programming interfaces (APIs). NeLS has been in production since 2015, with training and support provided by the help desk of ELIXIR Norway. Through collaboration with NorSeq, the national consortium for high-throughput sequencing, an integrated service is offered so that sequencing data generated in a research project is provided to the involved researchers through NeLS. Sensitive data, such as individual genomic sequencing data, are handled using the TSD (Services for Sensitive Data) platform provided by Sigma2 and the University of Oslo. NeLS integrates national e-infrastructure storage and computing resources, and is also integrated with the SEEK platform in order to store large data files produced by experiments described in SEEK. In this article, we outline the architecture of NeLS and discuss possible directions for further development.
Collapse
Affiliation(s)
- Kidane M. Tekle
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | | | - Kjetil Klepper
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Lars Ailo Bongo
- University of Tromsø - The Arctic University of Norway, Tromsø, Norway
| | | | - Xiaxi Li
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Wei Zhang
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Christian Andreetta
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Teshome Dagne Mulugeta
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | - Matúš Kalaš
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Morten B. Rye
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Erik Hjerde
- University of Tromsø - The Arctic University of Norway, Tromsø, Norway
| | - Jeevan Karloss Antony Samy
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | | | - Dag Inge Våge
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Norwegian University of Life Sciences, Ås, Norway
| | | | | | - Finn Drabløs
- Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | | | - Kjell Petersen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| | - Inge Jonassen
- Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
| |
Collapse
|
21
|
Rivera-Mulia JC, Dimond A, Vera D, Trevilla-Garcia C, Sasaki T, Zimmerman J, Dupont C, Gribnau J, Fraser P, Gilbert DM. Allele-specific control of replication timing and genome organization during development. Genome Res 2018; 28:800-811. [PMID: 29735606 PMCID: PMC5991511 DOI: 10.1101/gr.232561.117] [Citation(s) in RCA: 40] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 04/26/2018] [Indexed: 12/14/2022]
Abstract
DNA replication occurs in a defined temporal order known as the replication-timing (RT) program. RT is regulated during development in discrete chromosomal units, coordinated with transcriptional activity and 3D genome organization. Here, we derived distinct cell types from F1 hybrid musculus × castaneus mouse crosses and exploited the high single-nucleotide polymorphism (SNP) density to characterize allelic differences in RT (Repli-seq), genome organization (Hi-C and promoter-capture Hi-C), gene expression (total nuclear RNA-seq), and chromatin accessibility (ATAC-seq). We also present HARP, a new computational tool for sorting SNPs in phased genomes to efficiently measure allele-specific genome-wide data. Analysis of six different hybrid mESC clones with different genomes (C57BL/6, 129/sv, and CAST/Ei), parental configurations, and gender revealed significant RT asynchrony between alleles across ∼12% of the autosomal genome linked to subspecies genomes but not to parental origin, growth conditions, or gender. RT asynchrony in mESCs strongly correlated with changes in Hi-C compartments between alleles but not as strongly with SNP density, gene expression, imprinting, or chromatin accessibility. We then tracked mESC RT asynchronous regions during development by analyzing differentiated cell types, including extraembryonic endoderm stem (XEN) cells, four male and female primary mouse embryonic fibroblasts (MEFs), and neural precursor cells (NPCs) differentiated in vitro from mESCs with opposite parental configurations. We found that RT asynchrony and allelic discordance in Hi-C compartments seen in mESCs were largely lost in all differentiated cell types, accompanied by novel sites of allelic asynchrony at a considerably smaller proportion of the genome, suggesting that genome organization of homologs converges to similar folding patterns during cell fate commitment.
Collapse
Affiliation(s)
- Juan Carlos Rivera-Mulia
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Andrew Dimond
- The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, United Kingdom
| | - Daniel Vera
- Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, Florida 32306, USA
| | - Claudia Trevilla-Garcia
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Takayo Sasaki
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Jared Zimmerman
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
| | - Catherine Dupont
- Department of Reproduction and Development, Erasmus MC, University Medical Center, 3015GE Rotterdam, The Netherlands
| | - Joost Gribnau
- Department of Reproduction and Development, Erasmus MC, University Medical Center, 3015GE Rotterdam, The Netherlands
| | - Peter Fraser
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
- The Babraham Institute, Babraham Research Campus, Cambridge CB22 3AT, United Kingdom
- Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, Florida 32306, USA
| | - David M Gilbert
- Department of Biological Science, Florida State University, Tallahassee, Florida 32306-4295, USA
- Center for Genomics and Personalized Medicine, Florida State University, Tallahassee, Florida 32306, USA
| |
Collapse
|
22
|
Li R, Liu Y, Hou Y, Gan J, Wu P, Li C. 3D genome and its disorganization in diseases. Cell Biol Toxicol 2018; 34:351-365. [DOI: 10.1007/s10565-018-9430-4] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2017] [Accepted: 03/26/2018] [Indexed: 01/25/2023]
|
23
|
Kanduri C, Domanska D, Hovig E, Sandve GK. Genome build information is an essential part of genomic track files. Genome Biol 2017; 18:175. [PMID: 28911336 PMCID: PMC5599886 DOI: 10.1186/s13059-017-1312-1] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/12/2017] [Accepted: 08/29/2017] [Indexed: 11/10/2022] Open
Abstract
Genomic locations are represented as coordinates on a specific genome build version, but the build information is frequently missing when coordinates are provided. We show that this information is essential to correctly interpret and analyse the genomic intervals contained in genomic track files. Although not a substitute for best practices, we also provide a tool to predict the genome build version of genomic track files.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Department of Informatics, University of Oslo, 0316, Oslo, Norway
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, 0318, Oslo, Norway
| | - Diana Domanska
- Department of Informatics, University of Oslo, 0316, Oslo, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, 0316, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, 0424, Oslo, Norway
- Institute for Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo University Hospital, 0424, Oslo, Norway
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, 0316, Oslo, Norway.
- K.G. Jebsen Coeliac Disease Research Centre, University of Oslo, 0318, Oslo, Norway.
| |
Collapse
|
24
|
Simovski B, Vodák D, Gundersen S, Domanska D, Azab A, Holden L, Holden M, Grytten I, Rand K, Drabløs F, Johansen M, Mora A, Lund-Andersen C, Fromm B, Eskeland R, Gabrielsen OS, Ferkingstad E, Nakken S, Bengtsen M, Nederbragt AJ, Thorarensen HS, Akse JA, Glad I, Hovig E, Sandve GK. GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome. Gigascience 2017; 6:1-12. [PMID: 28459977 PMCID: PMC5493745 DOI: 10.1093/gigascience/gix032] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2016] [Revised: 01/17/2017] [Accepted: 04/24/2017] [Indexed: 12/01/2022] Open
Abstract
Background Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.
Collapse
Affiliation(s)
- Boris Simovski
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Daniel Vodák
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | | | - Diana Domanska
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Abdulrahman Azab
- Department of Informatics, University of Oslo, Oslo, Norway
- Research Support Services Group, University Center for Information Technology, Oslo, Norway
| | - Lars Holden
- Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
| | - Marit Holden
- Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
| | - Ivar Grytten
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Knut Rand
- Department of Mathematics, University of Oslo, Oslo, Norway
| | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology (NTNU), Trondheim, Norway
| | - Morten Johansen
- Institute for Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | - Antonio Mora
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Christin Lund-Andersen
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Bastian Fromm
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Ragnhild Eskeland
- Department of Biosciences, University of Oslo, Oslo, Norway
- Norwegian Center for Stem Cell Research, Department of Immunology, Oslo University Hospital, Oslo, Norway
| | | | | | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Mads Bengtsen
- Department of Biosciences, University of Oslo, Oslo, Norway
| | - Alexander Johan Nederbragt
- Department of Informatics, University of Oslo, Oslo, Norway
- Centre for Ecological and Evolutionary Synthesis (CEES), Department of Biosciences, University of Oslo, Oslo, Norway
| | | | | | - Ingrid Glad
- Department of Mathematics, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Statistics For Innovation, Norwegian Computing Center, Oslo, Norway
- Institute for Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| | | |
Collapse
|
25
|
Rand KD, Grytten I, Nederbragt AJ, Storvik GO, Glad IK, Sandve GK. Coordinates and intervals in graph-based reference genomes. BMC Bioinformatics 2017; 18:263. [PMID: 28521770 PMCID: PMC5437615 DOI: 10.1186/s12859-017-1678-9] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2017] [Accepted: 05/08/2017] [Indexed: 11/24/2022] Open
Abstract
BACKGROUND It has been proposed that future reference genomes should be graph structures in order to better represent the sequence diversity present in a species. However, there is currently no standard method to represent genomic intervals, such as the positions of genes or transcription factor binding sites, on graph-based reference genomes. RESULTS We formalize offset-based coordinate systems on graph-based reference genomes and introduce methods for representing intervals on these reference structures. We show the advantage of our methods by representing genes on a graph-based representation of the newest assembly of the human genome (GRCh38) and its alternative loci for regions that are highly variable. CONCLUSION More complex reference genomes, containing alternative loci, require methods to represent genomic data on these structures. Our proposed notation for genomic intervals makes it possible to fully utilize the alternative loci of the GRCh38 assembly and potential future graph-based reference genomes. We have made a Python package for representing such intervals on offset-based coordinate systems, available at https://github.com/uio-cels/offsetbasedgraph . An interactive web-tool using this Python package to visualize genes on a graph created from GRCh38 is available at https://github.com/uio-cels/genomicgraphcoords .
Collapse
Affiliation(s)
- Knut D. Rand
- Department of Mathematics, University of Oslo, Moltke Moes vei 35, Oslo, 0851 Norway
| | - Ivar Grytten
- Department of informatics, University of Oslo, Gaustadalleen 23 B, Oslo, 0371 Norway
| | - Alexander J. Nederbragt
- Department of informatics, University of Oslo, Gaustadalleen 23 B, Oslo, 0371 Norway
- Department of Biosciences, University of Oslo, Blindernvn. 31, Oslo, 0371 Norway
| | - Geir O. Storvik
- Department of Mathematics, University of Oslo, Moltke Moes vei 35, Oslo, 0851 Norway
| | - Ingrid K. Glad
- Department of Mathematics, University of Oslo, Moltke Moes vei 35, Oslo, 0851 Norway
| | - Geir K. Sandve
- Department of informatics, University of Oslo, Gaustadalleen 23 B, Oslo, 0371 Norway
| |
Collapse
|
26
|
Male-Specific Transcription Factor Occupancy Alone Does Not Account for Differential Methylation at Imprinted Genes in the mouse Germ Cell Lineage. G3-GENES GENOMES GENETICS 2016; 6:3975-3983. [PMID: 27694116 PMCID: PMC5144967 DOI: 10.1534/g3.116.033613] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/30/2022]
Abstract
Genomic imprinting is an epigenetic mechanism that affects a subset of mammalian genes, resulting in monoallelic expression depending on the parental origin of the alleles. Imprinted regions contain regulatory elements that are methylated in the gametes in a sex-specific manner (differentially methylated regions; DMRs). DMRs are present at nonimprinted loci as well, but whereas most regions are equalized after fertilization, methylation at imprinted regions maintains asymmetry. We tested the hypothesis that paternally unmethylated DMRs are occupied by transcription factors (TFs) present during male gametogenesis. Meta-analysis of mouse RNA data to identify DNA-binding proteins expressed in male gametes and motif enrichment analysis of active promoters yielded a list of candidate TFs. We then asked whether imprinted or nonimprinted paternally unmethylated DMRs harbored motifs for these TFs, and found many shared motifs between the two groups. However, DMRs that are methylated in the male germ cells also share motifs with DMRs that remain unmethylated. There are recognition sequences exclusive to the unmethylated DMRs, whether imprinted or not, that correspond with cell-cycle regulators, such as p53. Thus, at least with the current available data, our results indicate a complex scenario in which TF occupancy alone is not likely to play a role in protecting unmethylated DMRs, at least during male gametogenesis. Rather, the epigenetic features of DMRs, regulatory sequences other than DMRs, and the role of DNA-binding proteins capable of endowing sequence specificity to DNA-methylating enzymes are feasible mechanisms and further investigation is needed to answer this question.
Collapse
|
27
|
Abstract
Estrogen is a steroid hormone that plays critical roles in a myriad of intracellular pathways. The expression of many genes is regulated through the steroid hormone receptors ESR1 and ESR2. These bind to DNA and modulate the expression of target genes. Identification of estrogen target genes is greatly facilitated by the use of transcriptomic methods, such as RNA-seq and expression microarrays, and chromatin immunoprecipitation with massively parallel sequencing (ChIP-seq). Combining transcriptomic and ChIP-seq data enables a distinction to be drawn between direct and indirect estrogen target genes. This chapter discusses some methods of identifying estrogen target genes that do not require any expertise in programming languages or complex bioinformatics.
Collapse
Affiliation(s)
- Adam E Handel
- Department of Physiology, Anatomy and Genetics, University of Oxford, South Parks Road, Oxford, OX1 3QX, UK.
- Weatherall Institute of Molecular Medicine, University of Oxford, Headley Way, Oxford, OX3 9DS, UK.
| |
Collapse
|
28
|
Ekstrøm PO, Nakken S, Johansen M, Hovig E. Automated amplicon design suitable for analysis of DNA variants by melting techniques. BMC Res Notes 2015; 8:667. [PMID: 26559640 PMCID: PMC4642734 DOI: 10.1186/s13104-015-1624-8] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/18/2014] [Accepted: 10/26/2015] [Indexed: 05/28/2023] Open
Abstract
Background The technological development of DNA analysis has had tremendous development in recent years, and the present deep sequencing techniques present unprecedented opportunities for detailed and high-throughput DNA variant detection. Although DNA sequencing has had an exponential decrease in cost per base pair analyzed, focused and target-specific methods are however still much in use for analysis of DNA variants. With increasing capacity in the analytical procedures, an equal demand in automated amplicon and primer design has emerged. Results We have constructed a web-based tool that is able to batch design DNA variant assay suitable for analysis by denaturing gel/capillary electrophoresis and high resolution melting. The tool is developed as a computational workflow that implements one of the most widely used primer design tools, followed by validation of primer specificity, as well as calculation and visualization of the melting properties of the resulting amplicon, with or without an artificial high melting domain attached. The tool will be useful for scientists applying DNA melting techniques in analysis of DNA variations. The tool is freely available at http://meltprimer.ous-research.no/. Conclusion Herein, we demonstrate a novel tool with respect to covering the whole amplicon design workflow necessary for groups that use melting equilibrium techniques to separate DNA variants.
Collapse
Affiliation(s)
- Per Olaf Ekstrøm
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Montebello, Oslo, 0310, Norway.
| | - Sigve Nakken
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Montebello, Oslo, 0310, Norway.
| | - Morten Johansen
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Montebello, Oslo, 0310, Norway.
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, The Norwegian Radium Hospital, Montebello, Oslo, 0310, Norway. .,Institute of Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo University Hosptal, Nydalen, Oslo, 0424, Norway. .,Department of Informatics, University of Oslo, Blindern, Oslo, 0318, Norway.
| |
Collapse
|
29
|
Lercher L, Raj R, Patel NA, Price J, Mohammed S, Robinson CV, Schofield CJ, Davis BG. Generation of a synthetic GlcNAcylated nucleosome reveals regulation of stability by H2A-Thr101 GlcNAcylation. Nat Commun 2015; 6:7978. [PMID: 26305776 PMCID: PMC4560749 DOI: 10.1038/ncomms8978] [Citation(s) in RCA: 44] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2014] [Accepted: 07/02/2015] [Indexed: 02/06/2023] Open
Abstract
O-GlcNAcylation is a newly discovered histone modification implicated in transcriptional regulation, but no structural information on the physical effect of GlcNAcylation on chromatin exists. Here, we generate synthetic, pure GlcNAcylated histones and nucleosomes and reveal that GlcNAcylation can modulate structure through direct destabilization of H2A/H2B dimers in the nucleosome, thus promoting an 'open' chromatin state. The results suggest that a plausible molecular basis for one role of histone O-GlcNAcylation in epigenetic regulation is to lower the barrier for RNA polymerase passage and hence increase transcription.
Collapse
Affiliation(s)
- Lukas Lercher
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, UK
| | - Ritu Raj
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, UK
| | - Nisha A. Patel
- Department of Chemistry, University of Oxford, Physical and Theoretical Chemistry Laboratory, South Parks Road, Oxford OX1 3QZ, UK
| | - Joshua Price
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, UK
| | - Shabaz Mohammed
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, UK
| | - Carol V. Robinson
- Department of Chemistry, University of Oxford, Physical and Theoretical Chemistry Laboratory, South Parks Road, Oxford OX1 3QZ, UK
| | - Christopher J. Schofield
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, UK
| | - Benjamin G. Davis
- Department of Chemistry, University of Oxford, Chemistry Research Laboratory, Mansfield Road, Oxford OX1 3TA, UK
| |
Collapse
|
30
|
Younesy H, Möller T, Lorincz MC, Karimi MM, Jones SJM. VisRseq: R-based visual framework for analysis of sequencing data. BMC Bioinformatics 2015; 16 Suppl 11:S2. [PMID: 26328469 PMCID: PMC4559603 DOI: 10.1186/1471-2105-16-s11-s2] [Citation(s) in RCA: 49] [Impact Index Per Article: 5.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Several tools have been developed to enable biologists to perform initial browsing and exploration of sequencing data. However the computational tool set for further analyses often requires significant computational expertise to use and many of the biologists with the knowledge needed to interpret these data must rely on programming experts. RESULTS We present VisRseq, a framework for analysis of sequencing datasets that provides a computationally rich and accessible framework for integrative and interactive analyses without requiring programming expertise. We achieve this aim by providing R apps, which offer a semi-auto generated and unified graphical user interface for computational packages in R and repositories such as Bioconductor. To address the interactivity limitation inherent in R libraries, our framework includes several native apps that provide exploration and brushing operations as well as an integrated genome browser. The apps can be chained together to create more powerful analysis workflows. CONCLUSIONS To validate the usability of VisRseq for analysis of sequencing data, we present two case studies performed by our collaborators and report their workflow and insights.
Collapse
|
31
|
Veluchamy A, Rastogi A, Lin X, Lombard B, Murik O, Thomas Y, Dingli F, Rivarola M, Ott S, Liu X, Sun Y, Rabinowicz PD, McCarthy J, Allen AE, Loew D, Bowler C, Tirichine L. An integrative analysis of post-translational histone modifications in the marine diatom Phaeodactylum tricornutum. Genome Biol 2015; 16:102. [PMID: 25990474 PMCID: PMC4504042 DOI: 10.1186/s13059-015-0671-8] [Citation(s) in RCA: 57] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/16/2015] [Accepted: 05/11/2015] [Indexed: 01/09/2023] Open
Abstract
BACKGROUND Nucleosomes are the building blocks of chromatin where gene regulation takes place. Chromatin landscapes have been profiled for several species, providing insights into the fundamental mechanisms of chromatin-mediated transcriptional regulation of gene expression. However, knowledge is missing for several major and deep-branching eukaryotic groups, such as the Stramenopiles, which include the diatoms. Diatoms are highly diverse and ubiquitous species of phytoplankton that play a key role in global biogeochemical cycles. Dissecting chromatin-mediated regulation of genes in diatoms will help understand the ecological success of these organisms in contemporary oceans. RESULTS Here, we use high resolution mass spectrometry to identify a full repertoire of post-translational modifications on histones of the marine diatom Phaeodactylum tricornutum, including eight novel modifications. We map five histone marks coupled with expression data and show that P. tricornutum displays both unique and broadly conserved chromatin features, reflecting the chimeric nature of its genome. Combinatorial analysis of histone marks and DNA methylation demonstrates the presence of an epigenetic code defining activating or repressive chromatin states. We further profile three specific histone marks under conditions of nitrate depletion and show that the histone code is dynamic and targets specific sets of genes. CONCLUSIONS This study is the first genome-wide characterization of the histone code from a stramenopile and a marine phytoplankton. The work represents an important initial step for understanding the evolutionary history of chromatin and how epigenetic modifications affect gene expression in response to environmental cues in marine environments.
Collapse
Affiliation(s)
- Alaguraj Veluchamy
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France. .,Present address: BESE Division, Center for Desert Agriculture, King Abdullah University of Science and Technology, Thuwal, 23955-6900, Saudi Arabia.
| | - Achal Rastogi
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France.
| | - Xin Lin
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France. .,Present address: State key lab of Marine Environmental Science, Xiamen University, Xiamen, 361005, China.
| | - Bérangère Lombard
- Institut Curie, PSL Research University, Centre de Recherche, Laboratoire de Spectrométrie de Masse Protéomique, 26 rue d'Ulm, 75248, Cedex 05 Paris, France.
| | - Omer Murik
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France.
| | - Yann Thomas
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France.
| | - Florent Dingli
- Institut Curie, PSL Research University, Centre de Recherche, Laboratoire de Spectrométrie de Masse Protéomique, 26 rue d'Ulm, 75248, Cedex 05 Paris, France.
| | - Maximo Rivarola
- Institute for Genome Sciences (IGS), University of Maryland School of Medicine, Baltimore, MD, 21201, USA. .,Present address: Instituto de Biotecnología, CICVyA, Instituto Nacional de Tecnología Agropecuaria (INTA Castelar), CC 25, Castelar, B1712WAA, Argentina.
| | - Sandra Ott
- Institute for Genome Sciences (IGS), University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Xinyue Liu
- Institute for Genome Sciences (IGS), University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Yezhou Sun
- Institute for Genome Sciences (IGS), University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - Pablo D Rabinowicz
- Institute for Genome Sciences (IGS), University of Maryland School of Medicine, Baltimore, MD, 21201, USA.
| | - James McCarthy
- J. Craig Venter Institute, 10355 Science Center Drive, San Diego, CA, 92121, USA.
| | - Andrew E Allen
- J. Craig Venter Institute, 10355 Science Center Drive, San Diego, CA, 92121, USA. .,Scripps Institution of Oceanography, Integrative Oceanography Division, University of California, San Diego, CA, 92093, USA.
| | - Damarys Loew
- Institut Curie, PSL Research University, Centre de Recherche, Laboratoire de Spectrométrie de Masse Protéomique, 26 rue d'Ulm, 75248, Cedex 05 Paris, France.
| | - Chris Bowler
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France.
| | - Leïla Tirichine
- Ecology and Evolutionary Biology Section, Institut de Biologie de l'École Normale Supérieure (IBENS), CNRS UMR8197 INSERM U1024, 46 rue d'Ulm, 75005, Paris, France.
| |
Collapse
|
32
|
Mosquera Orgueira A. Hidden among the crowd: differential DNA methylation-expression correlations in cancer occur at important oncogenic pathways. Front Genet 2015; 6:163. [PMID: 26029238 PMCID: PMC4429616 DOI: 10.3389/fgene.2015.00163] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2014] [Accepted: 04/10/2015] [Indexed: 12/31/2022] Open
Abstract
DNA methylation is a frequent epigenetic mechanism that participates in transcriptional repression. Variations in DNA methylation with respect to gene expression are constant, and, for unknown reasons, some genes with highly methylated promoters are sometimes overexpressed. In this study we have analyzed the expression and methylation patterns of thousands of genes in five groups of cancer and normal tissue samples in order to determine local and genome-wide differences. We observed significant changes in global methylation-expression correlation in all the neoplasms, which suggests that differential correlation events are frequent in cancer. A focused analysis in the breast cancer cohort identified 1662 genes whose correlation varies significantly between normal and cancerous breast, but whose DNA methylation and gene expression patterns do not change substantially. These genes were enriched in cancer-related pathways and repressive chromatin features across various model cell lines, such as PRC2 binding and H3K27me3 marks. Substantial changes in methylation-expression correlation indicate that these genes are subject to epigenetic remodeling, where the differential activity of other factors break the expected relationship between both variables. Our findings suggest a complex regulatory landscape where a redistribution of local and large-scale chromatin repressive domains at differentially correlated genes (DCGs) creates epigenetic hotspots that modulate cancer-specific gene expression.
Collapse
|
33
|
Rydbeck H, Sandve GK, Ferkingstad E, Simovski B, Rye M, Hovig E. ClusTrack: feature extraction and similarity measures for clustering of genome-wide data sets. PLoS One 2015; 10:e0123261. [PMID: 25879845 PMCID: PMC4400084 DOI: 10.1371/journal.pone.0123261] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2014] [Accepted: 02/17/2015] [Indexed: 11/18/2022] Open
Abstract
Clustering is a popular technique for explorative analysis of data, as it can reveal subgroupings and similarities between data in an unsupervised manner. While clustering is routinely applied to gene expression data, there is a lack of appropriate general methodology for clustering of sequence-level genomic and epigenomic data, e.g. ChIP-based data. We here introduce a general methodology for clustering data sets of coordinates relative to a genome assembly, i.e. genomic tracks. By defining appropriate feature extraction approaches and similarity measures, we allow biologically meaningful clustering to be performed for genomic tracks using standard clustering algorithms. An implementation of the methodology is provided through a tool, ClusTrack, which allows fine-tuned clustering analyses to be specified through a web-based interface. We apply our methods to the clustering of occupancy of the H3K4me1 histone modification in samples from a range of different cell types. The majority of samples form meaningful subclusters, confirming that the definitions of features and similarity capture biological, rather than technical, variation between the genomic tracks. Input data and results are available, and can be reproduced, through a Galaxy Pages document at http://hyperbrowser.uio.no/hb/u/hb-superuser/p/clustrack. The clustering functionality is available as a Galaxy tool, under the menu option "Specialized analyzis of tracks", and the submenu option "Cluster tracks based on genome level similarity", at the Genomic HyperBrowser server: http://hyperbrowser.uio.no/hb/.
Collapse
Affiliation(s)
- Halfdan Rydbeck
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- * E-mail: (HR); (GKS)
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
- * E-mail: (HR); (GKS)
| | - Egil Ferkingstad
- Statistics For Innovation, Norwegian Computing Center, 0314 Oslo, Norway
- Science Institute, University of Iceland, Dunhaga 5, 107 Reykjavik, Iceland
| | - Boris Simovski
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Morten Rye
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, Trondheim, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumour Biology, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
- Department of Medical Informatics, The Norwegian Radium Hospital, Oslo University Hospital, Oslo, Norway
| |
Collapse
|
34
|
Ricigliano VAG, Handel AE, Sandve GK, Annibali V, Ristori G, Mechelli R, Cader MZ, Salvetti M. EBNA2 binds to genomic intervals associated with multiple sclerosis and overlaps with vitamin D receptor occupancy. PLoS One 2015; 10:e0119605. [PMID: 25853421 PMCID: PMC4390304 DOI: 10.1371/journal.pone.0119605] [Citation(s) in RCA: 36] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/07/2014] [Accepted: 01/14/2015] [Indexed: 12/23/2022] Open
Abstract
Epstein-Barr virus (EBV) is a non-heritable factor that associates with multiple sclerosis (MS). However its causal relationship with the disease is still unclear. The virus establishes a complex co-existence with the host that includes regulatory influences on gene expression. Hence, if EBV contributes to the pathogenesis of MS it may do so by interacting with disease predisposing genes. To verify this hypothesis we evaluated EBV nuclear antigen 2 (EBNA2, a protein that recent works by our and other groups have implicated in disease development) binding inside MS associated genomic intervals. We found that EBNA2 binding occurs within MS susceptibility sites more than expected by chance (factor of observed vs expected overlap [O/E] = 5.392-fold, p < 2.0e-05). This remains significant after controlling for multiple genomic confounders. We then asked whether this observation is significant per se or should also be viewed in the context of other disease relevant gene-environment interactions, such as those attributable to vitamin D. We therefore verified the overlap between EBNA2 genomic occupancy and vitamin D receptor (VDR) binding sites. EBNA2 shows a striking overlap with VDR binding sites (O/E = 96.16-fold, p < 2.0e-05), even after controlling for the chromatin accessibility state of shared regions (p <0.001). Furthermore, MS susceptibility regions are preferentially targeted by both EBNA2 and VDR than by EBNA2 alone (enrichment difference = 1.722-fold, p = 0.0267). Taken together, these findings demonstrate that EBV participates in the gene-environment interactions that predispose to MS.
Collapse
Affiliation(s)
- Vito A. G. Ricigliano
- Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
- Neuroimmunology Unit, Fondazione Santa Lucia (I.R.C.C.S.), Rome, Italy
| | - Adam E. Handel
- Medical Research Council Functional Genomics Unit and Department of Physiology, Anatomy and Genetics, University of Oxford, Oxford OX1 3PT, United Kingdom
- Weatherall Institute of Molecular Medicine, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DS, United Kingdom
| | - Geir K. Sandve
- Department of Informatics, University of Oslo, Blindern, Norway
| | - Viviana Annibali
- Centre for Experimental Neurological Therapies (CENTERS), Neurology and Department of Neuroscience, Mental Health and Sensory Organs, Faculty of Medicine and Psychology, “Sapienza” University of Rome, Rome, Italy
| | - Giovanni Ristori
- Centre for Experimental Neurological Therapies (CENTERS), Neurology and Department of Neuroscience, Mental Health and Sensory Organs, Faculty of Medicine and Psychology, “Sapienza” University of Rome, Rome, Italy
| | - Rosella Mechelli
- Centre for Experimental Neurological Therapies (CENTERS), Neurology and Department of Neuroscience, Mental Health and Sensory Organs, Faculty of Medicine and Psychology, “Sapienza” University of Rome, Rome, Italy
- * E-mail:
| | - M. Zameel Cader
- Nuffield Department of Clinical Neurosciences, University of Oxford, John Radcliffe Hospital, Oxford OX3 9DU, United Kingdom
| | - Marco Salvetti
- Centre for Experimental Neurological Therapies (CENTERS), Neurology and Department of Neuroscience, Mental Health and Sensory Organs, Faculty of Medicine and Psychology, “Sapienza” University of Rome, Rome, Italy
| |
Collapse
|
35
|
Christiansen IK, Sandve GK, Schmitz M, Dürst M, Hovig E. Transcriptionally active regions are the preferred targets for chromosomal HPV integration in cervical carcinogenesis. PLoS One 2015; 10:e0119566. [PMID: 25793388 PMCID: PMC4368827 DOI: 10.1371/journal.pone.0119566] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2014] [Accepted: 01/15/2015] [Indexed: 01/23/2023] Open
Abstract
Integration of human papillomavirus (HPV) into the host genome is regarded as a determining event in cervical carcinogenesis. However, the exact mechanism for integration, and the role of integration in stimulating cancer progression, is not fully characterized. Although integration sites are reported to appear randomly distributed over all chromosomes, fragile sites, translocation break points and transcriptionally active regions have all been suggested as being preferred sites for integration. In addition, more recent studies have reported integration events occurring within or surrounding essential cancer-related genes, raising the question whether these may reflect key events in the molecular genesis of HPV induced carcinomas. In a search for possible common denominators of the integration sites, we utilized the chromosomal coordinates of 121 viral-cellular fusion transcripts, and examined for statistical overrepresentation of integration sites with various features of ENCODE chromatin information data, using the Genomic HyperBrowser. We find that integration sites coincide with DNA that is transcriptionally active in mucosal epithelium, as judged by the relationship of integration sites to DNase hypersensitivity and H3K4me3 methylation data. Finding an association between integration and transcription is highly informative with regard to the spatio-temporal characteristics of the integration process. These results suggest that integration is an early event in carcinogenesis, more than a late product of chromosomal instability. If the viral integrations were more likely to occur in destabilized regions of the DNA, a completely random distribution of the integration sites would be expected. As a by-product of integration in actively transcribing DNA, a tendency of integration in or close to genes is likely to be observed. This increases the possibility of viral signals to modulate the expression of these genes, potentially contributing to the progression towards cancer.
Collapse
Affiliation(s)
- Irene Kraus Christiansen
- Department of Microbiology and Infection Control, Akershus University Hospital, Lørenskog, Norway
| | | | - Martina Schmitz
- Department of Gynaecology, Jena University Hospital, Jena, Germany
| | - Matthias Dürst
- Department of Gynaecology, Jena University Hospital, Jena, Germany
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, Oslo, Norway
| |
Collapse
|
36
|
Blankenberg D, Taylor J, Nekrutenko A. Online resources for genomic analysis using high-throughput sequencing. Cold Spring Harb Protoc 2015; 2015:324-35. [PMID: 25655493 DOI: 10.1101/pdb.top083667] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022]
Abstract
The availability of high-throughput sequencing has created enormous possibilities for scientific discovery. However, the massive amount of data being generated has resulted in a severe informatics bottleneck. A large number of tools exist for analyzing next-generation sequencing (NGS) data, yet often there remains a disconnect between these research tools and the ability of many researchers to use them. As a consequence, several online resources and communities have been developed to assist researchers with both the management and the analysis of sequencing data sets. Here we describe the use and applications of common file formats for coding and storing genomic data, consider several web-accessible open-source resources for the visualization and analysis of NGS data, and provide examples of typical analyses with links to further detailed exercises.
Collapse
Affiliation(s)
- Daniel Blankenberg
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802
| | - James Taylor
- Departments of Biology and Computer Science, Johns Hopkins University, Baltimore, Maryland 21211
| | - Anton Nekrutenko
- Department of Biochemistry and Molecular Biology, Penn State University, University Park, Pennsylvania 16802
| |
Collapse
|
37
|
|
38
|
Kravatsky YV, Chechetkin VR, Tchurikov NA, Kravatskaya GI. Genome-wide study of correlations between genomic features and their relationship with the regulation of gene expression. DNA Res 2015; 22:109-19. [PMID: 25627242 PMCID: PMC4379982 DOI: 10.1093/dnares/dsu044] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/05/2023] Open
Abstract
The broad class of tasks in genetics and epigenetics can be reduced to the study of various features that are distributed over the genome (genome tracks). The rapid and efficient processing of the huge amount of data stored in the genome-scale databases cannot be achieved without the software packages based on the analytical criteria. However, strong inhomogeneity of genome tracks hampers the development of relevant statistics. We developed the criteria for the assessment of genome track inhomogeneity and correlations between two genome tracks. We also developed a software package, Genome Track Analyzer, based on this theory. The theory and software were tested on simulated data and were applied to the study of correlations between CpG islands and transcription start sites in the Homo sapiens genome, between profiles of protein-binding sites in chromosomes of Drosophila melanogaster, and between DNA double-strand breaks and histone marks in the H. sapiens genome. Significant correlations between transcription start sites on the forward and the reverse strands were observed in genomes of D. melanogaster, Caenorhabditis elegans, Mus musculus, H. sapiens, and Danio rerio. The observed correlations may be related to the regulation of gene expression in eukaryotes. Genome Track Analyzer is freely available at http://ancorr.eimb.ru/.
Collapse
Affiliation(s)
- Yuri V Kravatsky
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Moscow 119991, Russia
| | - Vladimir R Chechetkin
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Moscow 119991, Russia
| | - Nikolai A Tchurikov
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Moscow 119991, Russia
| | - Galina I Kravatskaya
- Engelhardt Institute of Molecular Biology of Russian Academy of Sciences, Moscow 119991, Russia
| |
Collapse
|
39
|
Watson CT, Steinberg KM, Graves TA, Warren RL, Malig M, Schein J, Wilson RK, Holt RA, Eichler EE, Breden F. Sequencing of the human IG light chain loci from a hydatidiform mole BAC library reveals locus-specific signatures of genetic diversity. Genes Immun 2015; 16:24-34. [PMID: 25338678 PMCID: PMC4304971 DOI: 10.1038/gene.2014.56] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2014] [Revised: 09/03/2014] [Accepted: 09/03/2014] [Indexed: 12/24/2022]
Abstract
Germline variation at immunoglobulin (IG) loci is critical for pathogen-mediated immunity, but establishing complete haplotype sequences in these regions has been problematic because of complex sequence architecture and diploid source DNA. We sequenced BAC clones from the effectively haploid human hydatidiform mole cell line, CHM1htert, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype generated here is 1.25 Mb of contiguous sequence, including four novel IGLV alleles, one novel IGLC allele, and an 11.9-kb insertion. The CH17 IGK haplotype consists of two 644 kb proximal and 466 kb distal contigs separated by a large gap of unknown size; these assemblies added 49 kb of unique sequence extending into this gap. Our analysis also resulted in the characterization of seven novel IGKV alleles and a 16.7-kb region exhibiting signatures of interlocus sequence exchange between distal and proximal IGKV gene clusters. Genetic diversity in IGK/IGL was compared with that of the IG heavy chain (IGH) locus within the same haploid genome, revealing threefold (IGK) and sixfold (IGL) higher diversity in the IGH locus, potentially associated with increased levels of segmental duplication and the telomeric location of IGH.
Collapse
Affiliation(s)
- C T Watson
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
- Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY USA
| | - K M Steinberg
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- The Genome Institute, Washington University, St Louis, MO USA
| | - T A Graves
- The Genome Institute, Washington University, St Louis, MO USA
| | - R L Warren
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia Canada
| | - M Malig
- Department of Genome Sciences, University of Washington, Seattle, WA USA
| | - J Schein
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia Canada
| | - R K Wilson
- The Genome Institute, Washington University, St Louis, MO USA
| | - R A Holt
- Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia Canada
| | - E E Eichler
- Department of Genome Sciences, University of Washington, Seattle, WA USA
- Howard Hughes Medical Institute, Seattle, WA USA
| | - F Breden
- Department of Biological Sciences, Simon Fraser University, Burnaby, British Columbia, Canada
| |
Collapse
|
40
|
Macchia G, Nord KH, Zoli M, Purgato S, D'Addabbo P, Whelan CW, Carbone L, Perini G, Mertens F, Rocchi M, Storlazzi CT. Ring chromosomes, breakpoint clusters, and neocentromeres in sarcomas. Genes Chromosomes Cancer 2014; 54:156-67. [PMID: 25421174 DOI: 10.1002/gcc.22228] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2014] [Accepted: 11/03/2014] [Indexed: 01/04/2023] Open
Abstract
Gene amplification is relatively common in tumors. In certain subtypes of sarcoma, it often occurs in the form of ring and/or giant rod-shaped marker (RGM) chromosomes whose mitotic stability is frequently rescued by ectopic novel centromeres (neocentromeres). Little is known about the origin and structure of these RGM chromosomes, including how they arise, their internal organization, and which sequences underlie the neocentromeres. To address these questions, 42 sarcomas with RGM chromosomes were investigated to detect regions prone to double strand breaks and possible functional or structural constraints driving the amplification process. We found nine breakpoint cluster regions potentially involved in the genesis of RGM chromosomes, which turned out to be significantly enriched in poly-pyrimidine traits. Some of the clusters were located close to genes already known to be relevant for sarcomas, thus indicating a potential functional constraint, while others mapped to transcriptionally inactive chromatin domains enriched in heterochromatic sites. Of note, five neocentromeres were identified after analyzing 13 of the cases by fluorescent in situ hybridization. ChIP-on-chip analysis with antibodies against the centromeric protein CENP-A showed that they were a patchwork of small genomic segments derived from different chromosomes, likely joint to form a contiguous sequence during the amplification process.
Collapse
Affiliation(s)
- Gemma Macchia
- Department of Biology, University of Bari, Bari, Italy; Department of Clinical Genetics, University and Regional Laboratories, Lund University, Lund, Sweden
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
41
|
Tchurikov NA, Fedoseeva DM, Sosin DV, Snezhkina AV, Melnikova NV, Kudryavtseva AV, Kravatsky YV, Kretova OV. Hot spots of DNA double-strand breaks and genomic contacts of human rDNA units are involved in epigenetic regulation. J Mol Cell Biol 2014; 7:366-82. [PMID: 25280477 PMCID: PMC4524424 DOI: 10.1093/jmcb/mju038] [Citation(s) in RCA: 50] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2014] [Accepted: 08/23/2014] [Indexed: 12/25/2022] Open
Abstract
DNA double-strand breaks (DSBs) are involved in many cellular mechanisms, including replication, transcription, and genome rearrangements. The recent observation that hot spots of DSBs in human chromosomes delimit DNA domains that possess coordinately expressed genes suggests a strong relationship between the organization of transcription patterns and hot spots of DSBs. In this study, we performed mapping of hot spots of DSBs in a human 43-kb ribosomal DNA (rDNA) repeated unit. We observed that rDNA units corresponded to the most fragile sites in human chromosomes and that these units possessed at least nine specific regions containing clusters of extremely frequently occurring DSBs, which were located exclusively in non-coding intergenic spacer (IGS) regions. The hot spots of DSBs corresponded to only a specific subset of DNase-hypersensitive sites, and coincided with CTCF, PARP1, and HNRNPA2B1 binding sites, and H3K4me3 marks. Our rDNA-4C data indicate that the regions of IGS containing the hot spots of DSBs often form contacts with specific regions in different chromosomes, including the pericentromeric regions, as well as regions that are characterized by H3K27ac and H3K4me3 marks, CTCF binding sites, ChIA-PET and RIP signals, and high levels of DSBs. The data suggest a strong link between chromosome breakage and several different mechanisms of epigenetic regulation of gene expression.
Collapse
Affiliation(s)
- Nickolai A Tchurikov
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Daria M Fedoseeva
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Dmitri V Sosin
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Anastasia V Snezhkina
- Group of Postgenomic Studies, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Nataliya V Melnikova
- Group of Postgenomic Studies, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Anna V Kudryavtseva
- Group of Postgenomic Studies, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Yuri V Kravatsky
- Laboratory of DNA-Protein Interactions, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| | - Olga V Kretova
- Department of Epigenetic Mechanisms of Gene Expression Regulation, Engelhardt Institute of Molecular Biology, Moscow 119334, Russia
| |
Collapse
|
42
|
Molyneux SD, Waterhouse PD, Shelton D, Shao YW, Watling CM, Tang QL, Harris IS, Dickson BC, Tharmapalan P, Sandve GK, Zhang X, Bailey SD, Berman H, Wunder JS, Izsvák Z, Lupien M, Mak TW, Khokha R. Human somatic cell mutagenesis creates genetically tractable sarcomas. Nat Genet 2014; 46:964-72. [DOI: 10.1038/ng.3065] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/23/2013] [Accepted: 07/23/2014] [Indexed: 01/15/2023]
|
43
|
Lajugie J, Fourel N, Bouhassira EE. GenPlay Multi-Genome, a tool to compare and analyze multiple human genomes in a graphical interface. ACTA ACUST UNITED AC 2014; 31:109-11. [PMID: 25178461 DOI: 10.1093/bioinformatics/btu588] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023]
Abstract
SUMMARY Parallel visualization of multiple individual human genomes is a complex endeavor that is rapidly gaining importance with the increasing number of personal, phased and cancer genomes that are being generated. It requires the display of variants such as SNPs, indels and structural variants that are unique to specific genomes and the introduction of multiple overlapping gaps in the reference sequence. Here, we describe GenPlay Multi-Genome, an application specifically written to visualize and analyze multiple human genomes in parallel. GenPlay Multi-Genome is ideally suited for the comparison of allele-specific expression and functional genomic data obtained from multiple phased genomes in a graphical interface with access to multiple-track operation. It also allows the analysis of data that have been aligned to custom genomes rather than to a standard reference and can be used as a variant calling format file browser and as a tool to compare different genome assembly, such as hg19 and hg38. AVAILABILITY AND IMPLEMENTATION GenPlay is available under the GNU public license (GPL-3) from http://genplay.einstein.yu.edu. The source code is available at https://github.com/JulienLajugie/GenPlay.
Collapse
Affiliation(s)
- Julien Lajugie
- Department of Cell Biology, Albert Einstein College of Medicine, New York, NY 10461, USA
| | - Nicolas Fourel
- Department of Cell Biology, Albert Einstein College of Medicine, New York, NY 10461, USA
| | - Eric E Bouhassira
- Department of Cell Biology, Albert Einstein College of Medicine, New York, NY 10461, USA
| |
Collapse
|
44
|
Antoniadis A, Glad I, Mohammed H. Local comparison of empirical distributions via nonparametric regression. J STAT COMPUT SIM 2014. [DOI: 10.1080/00949655.2014.929133] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/25/2022]
|
45
|
Sharma NL, Massie CE, Butter F, Mann M, Bon H, Ramos-Montoya A, Menon S, Stark R, Lamb AD, Scott HE, Warren AY, Neal DE, Mills IG. The ETS family member GABPα modulates androgen receptor signalling and mediates an aggressive phenotype in prostate cancer. Nucleic Acids Res 2014; 42:6256-69. [PMID: 24753418 PMCID: PMC4041454 DOI: 10.1093/nar/gku281] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2013] [Revised: 03/21/2014] [Accepted: 03/26/2014] [Indexed: 12/31/2022] Open
Abstract
In prostate cancer (PC), the androgen receptor (AR) is a key transcription factor at all disease stages, including the advanced stage of castrate-resistant prostate cancer (CRPC). In the present study, we show that GABPα, an ETS factor that is up-regulated in PC, is an AR-interacting transcription factor. Expression of GABPα enables PC cell lines to acquire some of the molecular and cellular characteristics of CRPC tissues as well as more aggressive growth phenotypes. GABPα has a transcriptional role that dissects the overlapping cistromes of the two most common ETS gene fusions in PC: overlapping significantly with ETV1 but not with ERG target genes. GABPα bound predominantly to gene promoters, regulated the expression of one-third of AR target genes and modulated sensitivity to AR antagonists in hormone responsive and castrate resistant PC models. This study supports a critical role for GABPα in CRPC and reveals potential targets for therapeutic intervention.
Collapse
Affiliation(s)
- Naomi L Sharma
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK Department of Urology, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ, UK
| | - Charlie E Massie
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Falk Butter
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Matthias Mann
- Department of Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Am Klopferspitz 18, D-82152 Martinsried, Germany
| | - Helene Bon
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Antonio Ramos-Montoya
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Suraj Menon
- Department of Bioinformatics, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Rory Stark
- Department of Bioinformatics, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Alastair D Lamb
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Helen E Scott
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK
| | - Anne Y Warren
- Department of Pathology, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ, UK
| | - David E Neal
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK Department of Urology, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ, UK Department of Oncology, University of Cambridge, Addenbrooke's Hospital, Hills Road, Cambridge CB2 2QQ, UK
| | - Ian G Mills
- Uro-oncology Research Group, CRUK Cambridge Institute, University of Cambridge, Li Ka Shing Centre, Robinson Way, Cambridge CB2 0RE, UK Prostate Cancer Research Group, Centre for Molecular Medicine (Norway), Nordic EMBL Partnership, University of Oslo and Oslo University Hospital, Gaustadalleen 21, Oslo N-0349, Norway Department of Cancer Prevention and Department of Urology, Oslo University Hospital, Oslo N-0349, Norway
| |
Collapse
|
46
|
Nygård S, Reitan T, Clancy T, Nygaard V, Bjørnstad J, Skrbic B, Tønnessen T, Christensen G, Hovig E. Identifying pathogenic processes by integrating microarray data with prior knowledge. BMC Bioinformatics 2014; 15:115. [PMID: 24758699 PMCID: PMC4006456 DOI: 10.1186/1471-2105-15-115] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2013] [Accepted: 04/09/2014] [Indexed: 11/30/2022] Open
Abstract
BACKGROUND It is of great importance to identify molecular processes and pathways that are involved in disease etiology. Although there has been an extensive use of various high-throughput methods for this task, pathogenic pathways are still not completely understood. Often the set of genes or proteins identified as altered in genome-wide screens show a poor overlap with canonical disease pathways. These findings are difficult to interpret, yet crucial in order to improve the understanding of the molecular processes underlying the disease progression. We present a novel method for identifying groups of connected molecules from a set of differentially expressed genes. These groups represent functional modules sharing common cellular function and involve signaling and regulatory events. Specifically, our method makes use of Bayesian statistics to identify groups of co-regulated genes based on the microarray data, where external information about molecular interactions and connections are used as priors in the group assignments. Markov chain Monte Carlo sampling is used to search for the most reliable grouping. RESULTS Simulation results showed that the method improved the ability of identifying correct groups compared to traditional clustering, especially for small sample sizes. Applied to a microarray heart failure dataset the method found one large cluster with several genes important for the structure of the extracellular matrix and a smaller group with many genes involved in carbohydrate metabolism. The method was also applied to a microarray dataset on melanoma cancer patients with or without metastasis, where the main cluster was dominated by genes related to keratinocyte differentiation. CONCLUSION Our method found clusters overlapping with known pathogenic processes, but also pointed to new connections extending beyond the classical pathways.
Collapse
Affiliation(s)
- Ståle Nygård
- Bioinformatics Core Facility, Institute for Medical Informatics, Oslo University Hospital, Oslo, Norway
- Institute for Experimental Medical Research, Oslo University Hospital and University of Oslo, Oslo, Norway
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
| | - Trond Reitan
- Center for Ecological and Evolutionary Synthesis, Department of Biology, University of Oslo, Oslo, Norway
| | - Trevor Clancy
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Vegard Nygaard
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
| | - Johannes Bjørnstad
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Biljana Skrbic
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Theis Tønnessen
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
- Department of Cardiothoracic Surgery, Oslo University Hospital, Oslo, Norway
| | - Geir Christensen
- Institute for Experimental Medical Research, Oslo University Hospital and University of Oslo, Oslo, Norway
- KG Jebsen Cardiac Research Centre and Center for Heart Failure Research, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Oslo, Norway
- Institute for Medical Informatics, Oslo University Hospital, Oslo, Norway
- Department of informatics, University of Oslo, Oslo, Norway
| |
Collapse
|
47
|
Rye M, Sandve GK, Daub CO, Kawaji H, Carninci P, Forrest ARR, Drabløs F. Chromatin states reveal functional associations for globally defined transcription start sites in four human cell lines. BMC Genomics 2014; 15:120. [PMID: 24669905 PMCID: PMC3986914 DOI: 10.1186/1471-2164-15-120] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2013] [Accepted: 12/07/2013] [Indexed: 11/26/2022] Open
Abstract
BACKGROUND Deciphering the most common modes by which chromatin regulates transcription, and how this is related to cellular status and processes is an important task for improving our understanding of human cellular biology. The FANTOM5 and ENCODE projects represent two independent large scale efforts to map regulatory and transcriptional features to the human genome. Here we investigate chromatin features around a comprehensive set of transcription start sites in four cell lines by integrating data from these two projects. RESULTS Transcription start sites can be distinguished by chromatin states defined by specific combinations of both chromatin mark enrichment and the profile shapes of these chromatin marks. The observed patterns can be associated with cellular functions and processes, and they also show association with expression level, location relative to nearby genes, and CpG content. In particular we find a substantial number of repressed inter- and intra-genic transcription start sites enriched for active chromatin marks and Pol II, and these sites are strongly associated with immediate-early response processes and cell signaling. Associations between start sites with similar chromatin patterns are validated by significant correlations in their global expression profiles. CONCLUSIONS The results confirm the link between chromatin state and cellular function for expressed transcripts, and also indicate that active chromatin states at repressed transcripts may poise transcripts for rapid activation during immune response.
Collapse
Affiliation(s)
- Morten Rye
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, P.O. Box 8905, NO-7491 Trondheim, Norway
- St. Olavs Hospital, Postboks 3250, Sluppen 7006, Trondheim
| | | | - Carsten O Daub
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Hideya Kawaji
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa 230-0045, Japan
- RIKEN Preventive Medicine and Diagnosis Innovation Program, Wako, Saitama 351-0198, Japan
| | - Piero Carninci
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Alistair RR Forrest
- RIKEN Omics Science Center (OSC), 1-7-22 Suehiro-cho, Tsurumi-ku, Yokohama 230-0045, Japan
- RIKEN Center for Life Science Technologies, Division of Genomic Technologies, Yokohama, Kanagawa 230-0045, Japan
| | - Finn Drabløs
- Department of Cancer Research and Molecular Medicine, Norwegian University of Science and Technology, P.O. Box 8905, NO-7491 Trondheim, Norway
| |
Collapse
|
48
|
Wilson GA, Butcher LM, Foster HR, Feber A, Roos C, Walter L, Woszczek G, Beck S, Bell CG. Human-specific epigenetic variation in the immunological Leukotriene B4 Receptor (LTB4R/BLT1) implicated in common inflammatory diseases. Genome Med 2014; 6:19. [PMID: 24598577 PMCID: PMC4062055 DOI: 10.1186/gm536] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2013] [Accepted: 02/24/2014] [Indexed: 02/08/2023] Open
Abstract
BACKGROUND Common human diseases are caused by the complex interplay of genetic susceptibility as well as environmental factors. Due to the environment's influence on the epigenome, and therefore genome function, as well as conversely the genome's facilitative effect on the epigenome, analysis of this level of regulation may increase our knowledge of disease pathogenesis. METHODS In order to identify human-specific epigenetic influences, we have performed a novel genome-wide DNA methylation analysis comparing human, chimpanzee and rhesus macaque. RESULTS We have identified that the immunological Leukotriene B4 receptor (LTB4R, BLT1 receptor) is the most epigenetically divergent human gene in peripheral blood in comparison with other primates. This difference is due to the co-ordinated active state of human-specific hypomethylation in the promoter and human-specific increased gene body methylation. This gene is significant in innate immunity and the LTB4/LTB4R pathway is involved in the pathogenesis of the spectrum of human inflammatory diseases. This finding was confirmed by additional neutrophil-only DNA methylome and lymphoblastoid H3K4me3 chromatin comparative data. Additionally we show through functional analysis that this receptor has increased expression and a higher response to the LTB4 ligand in human versus rhesus macaque peripheral blood mononuclear cells. Genome-wide we also find human species-specific differentially methylated regions (human s-DMRs) are more prevalent in CpG island shores than within the islands themselves, and within the latter are associated with the CTCF motif. CONCLUSIONS This result further emphasises the exclusive nature of the human immunological system, its divergent adaptation even from very closely related primates, and the power of comparative epigenomics to identify and understand human uniqueness.
Collapse
Affiliation(s)
- Gareth A Wilson
- Medical Genomics, UCL Cancer Institute, University College London, London, UK ; Current address: Translational Cancer Therapeutics, CR-UK London Research Institute, Lincoln's Inn Fields, London, UK
| | - Lee M Butcher
- Medical Genomics, UCL Cancer Institute, University College London, London, UK
| | - Holly R Foster
- MRC & Asthma UK Centre in Allergic Mechanisms of Asthma, Division of Asthma, Allergy and Lung Biology, King's College London, London, UK
| | - Andrew Feber
- Medical Genomics, UCL Cancer Institute, University College London, London, UK
| | - Christian Roos
- Genebank of Primates and Primate Genetics Laboratory, German Primate Centre, Leibniz Institute for Primate Research, Göttingen, Germany
| | - Lutz Walter
- Genebank of Primates and Primate Genetics Laboratory, German Primate Centre, Leibniz Institute for Primate Research, Göttingen, Germany
| | - Grzegorz Woszczek
- MRC & Asthma UK Centre in Allergic Mechanisms of Asthma, Division of Asthma, Allergy and Lung Biology, King's College London, London, UK
| | - Stephan Beck
- Medical Genomics, UCL Cancer Institute, University College London, London, UK
| | - Christopher G Bell
- Medical Genomics, UCL Cancer Institute, University College London, London, UK ; Current address: Department of Twin Research & Genetic Epidemiology, St Thomas' Hospital, King's College London, London, UK
| |
Collapse
|
49
|
Paulsen J, Sandve GK, Gundersen S, Lien TG, Trengereid K, Hovig E. HiBrowse: multi-purpose statistical analysis of genome-wide chromatin 3D organization. ACTA ACUST UNITED AC 2014; 30:1620-2. [PMID: 24511080 PMCID: PMC4029040 DOI: 10.1093/bioinformatics/btu082] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
Summary: Recently developed methods that couple next-generation sequencing with chromosome conformation capture-based techniques, such as Hi-C and ChIA-PET, allow for characterization of genome-wide chromatin 3D structure. Understanding the organization of chromatin in three dimensions is a crucial next step in the unraveling of global gene regulation, and methods for analyzing such data are needed. We have developed HiBrowse, a user-friendly web-tool consisting of a range of hypothesis-based and descriptive statistics, using realistic assumptions in null-models. Availability and implementation: HiBrowse is supported by all major browsers, and is freely available at http://hyperbrowser.uio.no/3d. Software is implemented in Python, and source code is available for download by following instructions on the main site. Contact:jonaspau@ifi.uio.no Supplementary Information:Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jonas Paulsen
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Norway
| | - Geir Kjetil Sandve
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Norway
| | - Sveinung Gundersen
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Norway
| | - Tonje G Lien
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Norway
| | - Kai Trengereid
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Norway
| | - Eivind Hovig
- Institute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, NorwayInstitute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, NorwayInstitute for Cancer Genetics and Informatics, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, PO Box 4950, Nydalen, 0424 Oslo, Department of Mathematics, University of Oslo, Problemveien 7, 0313 Oslo and ELIXIR project, Department of Informatics, University of Oslo, Problemveien 7, 0313 Oslo, Norway
| |
Collapse
|
50
|
Effects of sulforaphane and 3,3'-diindolylmethane on genome-wide promoter methylation in normal prostate epithelial cells and prostate cancer cells. PLoS One 2014; 9:e86787. [PMID: 24466240 PMCID: PMC3899342 DOI: 10.1371/journal.pone.0086787] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2013] [Accepted: 12/13/2013] [Indexed: 12/21/2022] Open
Abstract
Epigenetic changes, including aberrant DNA methylation, result in altered gene expression and play an important role in carcinogenesis. Phytochemicals such as sulforaphane (SFN) and 3,3'-diindolylmethane (DIM) are promising chemopreventive agents for the treatment of prostate cancer. Both have been shown to induce re-expression of genes, including tumor suppressor genes silenced in cancer cells, via modulation of epigenetic marks including DNA methylation. However, it remained unclear the effects SFN and DIM on DNA methylation at a genomic scale. The goal of this study was to determine the genome-wide effects of SFN and DIM on promoter methylation in normal prostate epithelial cells and prostate cancer cells. Both SFN and DIM treatment decreased DNA methyltransferase expression in normal prostate epithelial cells (PrEC), and androgen-dependent (LnCAP) and androgen-independent (PC3) prostate cancer cells. The effects of SFN and DIM on promoter methylation profiles in normal PrEC, LnCAP and PC3 prostate cancer cells were determined using methyl-DNA immunoprecipitation followed by genome-wide DNA methylation array. We showed widespread changes in promoter methylation patterns, including both increased and decreased methylation, in all three prostate cell lines in response to SFN or DIM treatments. In particular, SFN and DIM altered promoter methylation in distinct sets of genes in PrEC, LnCAP, and PC3 cells, but shared similar gene targets within a single cell line. We further showed that SFN and DIM reversed many of the cancer-associated methylation alterations, including aberrantly methylated genes that are dysregulated or are highly involved in cancer progression. Overall, our data suggested that both SFN and DIM are epigenetic modulators that have broad and complex effects on DNA methylation profiles in both normal and cancerous prostate epithelial cells. Results from our study may provide new insights into the epigenetic mechanisms by which SFN and DIM exert their cancer chemopreventive effects.
Collapse
|