1
|
Rymuza J, Sun Y, Zheng G, LeRoy N, Murach M, Phan N, Zhang A, Sheffield N. Methods for constructing and evaluating consensus genomic interval sets. Nucleic Acids Res 2024; 52:10119-10131. [PMID: 39180401 PMCID: PMC11417377 DOI: 10.1093/nar/gkae685] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/03/2023] [Revised: 07/05/2024] [Accepted: 07/29/2024] [Indexed: 08/26/2024] Open
Abstract
The amount of genomic region data continues to increase. Integrating across diverse genomic region sets requires consensus regions, which enable comparing regions across experiments, but also by necessity lose precision in region definitions. We require methods to assess this loss of precision and build optimal consensus region sets. Here, we introduce the concept of flexible intervals and propose three novel methods for building consensus region sets, or universes: a coverage cutoff method, a likelihood method, and a Hidden Markov Model. We then propose three novel measures for evaluating how well a proposed universe fits a collection of region sets: a base-level overlap score, a region boundary distance score, and a likelihood score. We apply our methods and evaluation approaches to several collections of region sets and show how these methods can be used to evaluate fit of universes and build optimal universes. We describe scenarios where the common approach of merging regions to create consensus leads to undesirable outcomes and provide principled alternatives that provide interoperability of interval data while minimizing loss of resolution.
Collapse
Affiliation(s)
- Julia Rymuza
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Yuchen Sun
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Guangtao Zheng
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Nathan J LeRoy
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
| | - Maria Murach
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| | - Neil Phan
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
| | - Aidong Zhang
- Department of Computer Science, School of Engineering, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
| | - Nathan C Sheffield
- Department of Genome Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Department of Biomedical Engineering, School of Medicine, University of Virginia, Charlottesville, VA 22904, USA
- Department of Biochemistry and Molecular Genetics, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- School of Data Science, University of Virginia, Charlottesville, VA 22904, USA
- Department of Public Health Sciences, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
- Child Health Research Center, School of Medicine, University of Virginia, Charlottesville, VA 22908, USA
| |
Collapse
|
2
|
Wang Y, Wei Z, Su J, Coenen F, Meng J. RgnTX: Colocalization analysis of transcriptome elements in the presence of isoform heterogeneity and ambiguity. Comput Struct Biotechnol J 2023; 21:4110-4117. [PMID: 37671241 PMCID: PMC10475473 DOI: 10.1016/j.csbj.2023.08.021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2023] [Revised: 08/13/2023] [Accepted: 08/23/2023] [Indexed: 09/07/2023] Open
Abstract
Colocalization analysis of genomic region sets has been widely adopted to unveil potential functional interactions between corresponding biological attributes, which often serves as the basis for further investigation. A number of methods have been developed for colocalization analysis of genomic elements. However, none of them explicitly considered the transcriptome heterogeneity and isoform ambiguity, making them less appropriate for analyzing transcriptome elements. Here, we developed RgnTX, an R/Bioconductor tool for the colocalization analysis of transcriptome elements with permutation tests. Different from existing approaches, RgnTX directly takes advantage of transcriptome annotation, and offers high flexibility in the null model to simulate realistic transcriptome-wide background, such as the complex alternative splicing patterns. Importantly, it supports the testing of transcriptome elements without clear isoform association, which is often the real scenario due to technical limitations. Proposed package offers a wide selection of pre-defined functions, easy to be utilized by users for visualizing permutation results, calculating shifted z-scores and conducting multiple hypothesis testing under Benjamini-Hochberg correction. Moreover, with synthetic and real datasets, we show that RgnTX novel testing modes return distinct and more significant results compared to existing genome-based methods. We believe RgnTX should make a useful tool to characterize the randomness of the transcriptome, and for conducting statistical association analysis for genomic region sets within the heterogeneous transcriptome. The package now has been accepted by Bioconductor and is freely available at: https://bioconductor.org/packages/RgnTX.
Collapse
Affiliation(s)
- Yue Wang
- Department of Mathematical Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Zhen Wei
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jionglong Su
- School of AI and Advanced Computing, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
| | - Frans Coenen
- Department of Computer Science, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| | - Jia Meng
- Department of Biological Sciences, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- AI University Research Centre, Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu 215123, China
- Institute of Systems, Molecular and Integrative Biology, University of Liverpool, L69 7ZB Liverpool, United Kingdom
| |
Collapse
|
3
|
Hou X, Xu M, Zhu C, Gao J, Li M, Chen X, Sun C, Nashan B, Zang J, Zhou Y, Guang S, Feng X. Systematic characterization of chromodomain proteins reveals an H3K9me1/2 reader regulating aging in C. elegans. Nat Commun 2023; 14:1254. [PMID: 36878913 PMCID: PMC9988841 DOI: 10.1038/s41467-023-36898-y] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2022] [Accepted: 02/22/2023] [Indexed: 03/08/2023] Open
Abstract
The chromatin organization modifier domain (chromodomain) is an evolutionally conserved motif across eukaryotic species. The chromodomain mainly functions as a histone methyl-lysine reader to modulate gene expression, chromatin spatial conformation and genome stability. Mutations or aberrant expression of chromodomain proteins can result in cancer and other human diseases. Here, we systematically tag chromodomain proteins with green fluorescent protein (GFP) using CRISPR/Cas9 technology in C. elegans. By combining ChIP-seq analysis and imaging, we delineate a comprehensive expression and functional map of chromodomain proteins. We then conduct a candidate-based RNAi screening and identify factors that regulate the expression and subcellular localization of the chromodomain proteins. Specifically, we reveal an H3K9me1/2 reader, CEC-5, both by in vitro biochemistry and in vivo ChIP assays. MET-2, an H3K9me1/2 writer, is required for CEC-5 association with heterochromatin. Both MET-2 and CEC-5 are required for the normal lifespan of C. elegans. Furthermore, a forward genetic screening identifies a conserved Arginine124 of CEC-5's chromodomain, which is essential for CEC-5's association with chromatin and life span regulation. Thus, our work will serve as a reference to explore chromodomain functions and regulation in C. elegans and allow potential applications in aging-related human diseases.
Collapse
Affiliation(s)
- Xinhao Hou
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Mingjing Xu
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Chengming Zhu
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Jianing Gao
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Meili Li
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Xiangyang Chen
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Cheng Sun
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Björn Nashan
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Jianye Zang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China
| | - Ying Zhou
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China.
| | - Shouhong Guang
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China.
- CAS Center for Excellence in Molecular Cell Science, Chinese Academy of Sciences, 230027, Hefei, Anhui, P. R. China.
| | - Xuezhu Feng
- Department of Obstetrics and Gynecology, The First Affiliated Hospital of USTC, The USTC RNA Institute, Ministry of Education Key Laboratory for Membraneless Organelles & Cellular Dynamics, School of Life Sciences, Division of Life Sciences and Medicine, Biomedical Sciences and Health Laboratory of Anhui Province, University of Science and Technology of China, 230027, Hefei, Anhui, China.
| |
Collapse
|
4
|
Nuclear corepressors NCOR1/NCOR2 regulate B cell development, maintain genomic integrity and prevent transformation. Nat Immunol 2022; 23:1763-1776. [PMID: 36316474 PMCID: PMC9772092 DOI: 10.1038/s41590-022-01343-7] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2022] [Accepted: 09/16/2022] [Indexed: 12/15/2022]
Abstract
The nuclear corepressors NCOR1 and NCOR2 interact with transcription factors involved in B cell development and potentially link these factors to alterations in chromatin structure and gene expression. Herein, we demonstrate that Ncor1/2 deletion limits B cell differentiation via impaired recombination, attenuates pre-BCR signaling and enhances STAT5-dependent transcription. Furthermore, NCOR1/2-deficient B cells exhibited derepression of EZH2-repressed gene modules, including the p53 pathway. These alterations resulted in aberrant Rag1 and Rag2 expression and accessibility. Whole-genome sequencing of Ncor1/2 DKO B cells identified increased number of structural variants with cryptic recombination signal sequences. Finally, deletion of Ncor1 alleles in mice facilitated leukemic transformation, whereas human leukemias with less NCOR1 correlated with worse survival. NCOR1/2 mutations in human leukemia correlated with increased RAG expression and number of structural variants. These studies illuminate how the corepressors NCOR1/2 regulate B cell differentiation and provide insights into how NCOR1/2 mutations may promote B cell transformation.
Collapse
|
5
|
Bürger A, Dugas M. Cogito: automated and generic comparison of annotated genomic intervals. BMC Bioinformatics 2022; 23:315. [PMID: 35927614 PMCID: PMC9351259 DOI: 10.1186/s12859-022-04853-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2022] [Accepted: 07/23/2022] [Indexed: 11/27/2022] Open
Abstract
Background Genetic and epigenetic biological studies often combine different types of experiments and multiple conditions. While the corresponding raw and processed data are made available through specialized public databases, the processed files are usually limited to a specific research question. Hence, they are unsuitable for an unbiased, systematic overview of a complex dataset. However, possible combinations of different sample types and conditions grow exponentially with the amount of sample types and conditions. Therefore the risk to miss a correlation or to overrate an identified correlation should be mitigated in a complex dataset. Since reanalysis of a full study is rarely a viable option, new methods are needed to address these issues systematically, reliably, reproducibly and efficiently. Results Cogito “COmpare annotated Genomic Intervals TOol” provides a workflow for an unbiased, structured overview and systematic analysis of complex genomic datasets consisting of different data types (e.g. RNA-seq, ChIP-seq) and conditions. Cogito is able to visualize valuable key information of genomic or epigenomic interval-based data, thereby providing a straightforward analysis approach for comparing different conditions. It supports getting an unbiased impression of a dataset and developing an appropriate analysis strategy for it. In addition to a text-based report, Cogito offers a fully customizable report as a starting point for further in-depth investigation. Conclusions Cogito implements a novel approach to facilitate high-level overview analyses of complex datasets, and offers additional insights into the data without the need for a full, time-consuming reanalysis. The R/Bioconductor package is freely available at https://bioconductor.org/packages/release/bioc/html/Cogito.html, a comprehensive documentation with detailed descriptions and reproducible examples is included. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-022-04853-1.
Collapse
Affiliation(s)
- Annika Bürger
- Institute of Medical Informatics, Westfälische Wilhelms-Universität Münster, Albert-Schweitzer-Campus 1, 48149, Münster, Germany.
| | - Martin Dugas
- Institute of Medical Informatics, Heidelberg University Hospital, Seminarstr. 2, 69117, Heidelberg, Germany
| |
Collapse
|
6
|
Katsanos D, Barkoulas M. Targeted DamID in C. elegans reveals a direct role for LIN-22 and NHR-25 in antagonizing the epidermal stem cell fate. SCIENCE ADVANCES 2022; 8:eabk3141. [PMID: 35119932 PMCID: PMC8816332 DOI: 10.1126/sciadv.abk3141] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Accepted: 12/13/2021] [Indexed: 05/13/2023]
Abstract
Transcription factors are key players in gene networks controlling cell fate specification during development. In multicellular organisms, they display complex patterns of expression and binding to their targets, hence, tissue specificity is required in the characterization of transcription factor-target interactions. We introduce here targeted DamID (TaDa) as a method for tissue-specific transcription factor target identification in intact Caenorhabditis elegans animals. We use TaDa to recover targets in the epidermis for two factors, the HES1 homolog LIN-22, and the NR5A1/2 nuclear hormone receptor NHR-25. We demonstrate a direct link between LIN-22 and the Wnt signaling pathway through repression of the Frizzled receptor lin-17. We report a direct role for NHR-25 in promoting cell differentiation via repressing the expression of stem cell-promoting GATA factors. Our results expand our understanding of the epidermal gene network and highlight the potential of TaDa to dissect the architecture of tissue-specific gene regulatory networks.
Collapse
|
7
|
Gafurov A, Brejová B, Medvedev P. OUP accepted manuscript. Bioinformatics 2022; 38:i203-i211. [PMID: 35758770 PMCID: PMC9235476 DOI: 10.1093/bioinformatics/btac255] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
Motivation Genome annotations are a common way to represent genomic features such as genes, regulatory elements or epigenetic modifications. The amount of overlap between two annotations is often used to ascertain if there is an underlying biological connection between them. In order to distinguish between true biological association and overlap by pure chance, a robust measure of significance is required. One common way to do this is to determine if the number of intervals in the reference annotation that intersect the query annotation is statistically significant. However, currently employed statistical frameworks are often either inefficient or inaccurate when computing P-values on the scale of the whole human genome. Results We show that finding the P-values under the typically used ‘gold’ null hypothesis is NP-hard. This motivates us to reformulate the null hypothesis using Markov chains. To be able to measure the fidelity of our Markovian null hypothesis, we develop a fast direct sampling algorithm to estimate the P-value under the gold null hypothesis. We then present an open-source software tool MCDP that computes the P-values under the Markovian null hypothesis in O(m2+n) time and O(m) memory, where m and n are the numbers of intervals in the reference and query annotations, respectively. Notably, MCDP runtime and memory usage are independent from the genome length, allowing it to outperform previous approaches in runtime and memory usage by orders of magnitude on human genome annotations, while maintaining the same level of accuracy. Availability and implementation The software is available at https://github.com/fmfi-compbio/mc-overlaps. All data for reproducibility are available at https://github.com/fmfi-compbio/mc-overlaps-reproducibility. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Broňa Brejová
- Department of Computer Science, Comenius University, Bratislava 84248, Slovakia
| | - Paul Medvedev
- Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA 16802, USA
- Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
- Huck Institutes of the Life Sciences, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
8
|
He X, Yuan J, Wang Y. G3BP1 binds to guanine quadruplexes in mRNAs to modulate their stabilities. Nucleic Acids Res 2021; 49:11323-11336. [PMID: 34614161 PMCID: PMC8565330 DOI: 10.1093/nar/gkab873] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2021] [Revised: 09/12/2021] [Accepted: 10/04/2021] [Indexed: 12/21/2022] Open
Abstract
RNA guanine quadruplexes (rG4) assume important roles in post-transcriptional regulations of gene expression, which are often modulated by rG4-binding proteins. Hence, understanding the biological functions of rG4s requires the identification and functional characterizations of rG4-recognition proteins. By employing a bioinformatic approach based on the analysis of overlap between peaks obtained from rG4-seq analysis and those detected in >230 eCLIP-seq datasets for RNA-binding proteins generated from the ENCODE project, we identified a large number of candidate rG4-binding proteins. We showed that one of these proteins, G3BP1, is able to bind directly to rG4 structures with high affinity and selectivity, where the binding entails its C-terminal RGG domain and is further enhanced by its RRM domain. Additionally, our seCLIP-Seq data revealed that pyridostatin, a small-molecule rG4 ligand, could displace G3BP1 from mRNA in cells, with the most pronounced effects being observed for the 3′-untranslated regions (3′-UTR) of mRNAs. Moreover, luciferase reporter assay results showed that G3BP1 positively regulates mRNA stability through its binding with rG4 structures. Together, we identified a number of candidate rG4-binding proteins and validated that G3BP1 can bind directly with rG4 structures and regulate the stabilities of mRNAs.
Collapse
Affiliation(s)
- Xiaomei He
- Department of Chemistry, University of California, Riverside, CA 92521-0403, USA
| | - Jun Yuan
- Department of Chemistry, University of California, Riverside, CA 92521-0403, USA
| | - Yinsheng Wang
- Department of Chemistry, University of California, Riverside, CA 92521-0403, USA
| |
Collapse
|
9
|
Ferré Q, Chèneby J, Puthier D, Capponi C, Ballester B. Anomaly detection in genomic catalogues using unsupervised multi-view autoencoders. BMC Bioinformatics 2021; 22:460. [PMID: 34563116 PMCID: PMC8467021 DOI: 10.1186/s12859-021-04359-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Revised: 06/04/2021] [Accepted: 08/09/2021] [Indexed: 11/13/2022] Open
Abstract
Background Accurate identification of Transcriptional Regulator binding locations is essential for analysis of genomic regions, including Cis Regulatory Elements. The customary NGS approaches, predominantly ChIP-Seq, can be obscured by data anomalies and biases which are difficult to detect without supervision. Results Here, we develop a method to leverage the usual combinations between many experimental series to mark such atypical peaks. We use deep learning to perform a lossy compression of the genomic regions’ representations with multiview convolutions. Using artificial data, we show that our method correctly identifies groups of correlating series and evaluates CRE according to group completeness. It is then applied to the ReMap database’s large volume of curated ChIP-seq data. We show that peaks lacking known biological correlators are singled out and less confirmed in real data. We propose normalization approaches useful in interpreting black-box models. Conclusion Our approach detects peaks that are less corroborated than average. It can be extended to other similar problems, and can be interpreted to identify correlation groups. It is implemented in an open-source tool called atyPeak. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04359-2.
Collapse
Affiliation(s)
- Quentin Ferré
- INSERM, TAGC, Aix Marseille University, Marseille, France.,Université de Toulon, CNRS, LIS, Aix Marseille University, Marseille, France
| | - Jeanne Chèneby
- INSERM, TAGC, Aix Marseille University, Marseille, France
| | - Denis Puthier
- INSERM, TAGC, Aix Marseille University, Marseille, France
| | - Cécile Capponi
- Université de Toulon, CNRS, LIS, Aix Marseille University, Marseille, France.
| | | |
Collapse
|
10
|
Sprang M, Krüger M, Andrade-Navarro MA, Fontaine JF. Statistical guidelines for quality control of next-generation sequencing techniques. Life Sci Alliance 2021; 4:4/11/e202101113. [PMID: 34462322 PMCID: PMC8408346 DOI: 10.26508/lsa.202101113] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/05/2021] [Revised: 08/17/2021] [Accepted: 08/10/2021] [Indexed: 12/24/2022] Open
Abstract
More and more next-generation sequencing (NGS) data are made available every day. However, the quality of this data is not always guaranteed. Available quality control tools require profound knowledge to correctly interpret the multiplicity of quality features. Moreover, it is usually difficult to know if quality features are relevant in all experimental conditions. Therefore, the NGS community would highly benefit from condition-specific data-driven guidelines derived from many publicly available experiments, which reflect routinely generated NGS data. In this work, we have characterized well-known quality guidelines and related features in big datasets and concluded that they are too limited for assessing the quality of a given NGS file accurately. Therefore, we present new data-driven guidelines derived from the statistical analysis of many public datasets using quality features calculated by common bioinformatics tools. Thanks to this approach, we confirm the high relevance of genome mapping statistics to assess the quality of the data, and we demonstrate the limited scope of some quality features that are not relevant in all conditions. Our guidelines are available at https://cbdm.uni-mainz.de/ngs-guidelines.
Collapse
Affiliation(s)
- Maximilian Sprang
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Mainz, Germany
| | - Matteo Krüger
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Mainz, Germany
| | | | - Jean-Fred Fontaine
- Faculty of Biology, Johannes Gutenberg-Universität Mainz, Biozentrum I, Mainz, Germany
| |
Collapse
|
11
|
Gu A, Cho HJ, Sheffield NC. Bedshift: perturbation of genomic interval sets. Genome Biol 2021; 22:238. [PMID: 34416909 PMCID: PMC8379854 DOI: 10.1186/s13059-021-02440-w] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2020] [Accepted: 07/26/2021] [Indexed: 12/25/2022] Open
Abstract
Functional genomics experiments, like ChIP-Seq or ATAC-Seq, produce results that are summarized as a region set. There is no way to objectively evaluate the effectiveness of region set similarity metrics. We present Bedshift, a tool for perturbing BED files by randomly shifting, adding, and dropping regions from a reference file. The perturbed files can be used to benchmark similarity metrics, as well as for other applications. We highlight differences in behavior between metrics, such as that the Jaccard score is most sensitive to added or dropped regions, while coverage score is most sensitive to shifted regions.
Collapse
Affiliation(s)
- Aaron Gu
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia School of Engineering, Charlottesville, VA, USA
| | - Hyun Jae Cho
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
- Department of Computer Science, University of Virginia School of Engineering, Charlottesville, VA, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA.
- Department of Public Health Sciences, University of Virginia, Charlottesville, VA, USA.
- Department of Biomedical Engineering, University of Virginia, Charlottesville, VA, USA.
- Department of Biochemistry and Molecular Genetics, University of Virginia, Charlottesville, VA, USA.
| |
Collapse
|
12
|
Shew CJ, Carmona-Mora P, Soto DC, Mastoras M, Roberts E, Rosas J, Jagannathan D, Kaya G, O'Geen H, Dennis MY. Diverse Molecular Mechanisms Contribute to Differential Expression of Human Duplicated Genes. Mol Biol Evol 2021; 38:3060-3077. [PMID: 34009325 PMCID: PMC8321529 DOI: 10.1093/molbev/msab131] [Citation(s) in RCA: 10] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/06/2021] [Revised: 04/08/2021] [Accepted: 04/28/2021] [Indexed: 12/24/2022] Open
Abstract
Emerging evidence links genes within human-specific segmental duplications (HSDs) to traits and diseases unique to our species. Strikingly, despite being nearly identical by sequence (>98.5%), paralogous HSD genes are differentially expressed across human cell and tissue types, though the underlying mechanisms have not been examined. We compared cross-tissue mRNA levels of 75 HSD genes from 30 families between humans and chimpanzees and found expression patterns consistent with relaxed selection on or neofunctionalization of derived paralogs. In general, ancestral paralogs exhibited greatest expression conservation with chimpanzee orthologs, though exceptions suggest certain derived paralogs may retain or supplant ancestral functions. Concordantly, analysis of long-read isoform sequencing data sets from diverse human tissues and cell lines found that about half of derived paralogs exhibited globally lower expression. To understand mechanisms underlying these differences, we leveraged data from human lymphoblastoid cell lines (LCLs) and found no relationship between paralogous expression divergence and post-transcriptional regulation, sequence divergence, or copy-number variation. Considering cis-regulation, we reanalyzed ENCODE data and recovered hundreds of previously unidentified candidate CREs in HSDs. We also generated large-insert ChIP-sequencing data for active chromatin features in an LCL to better distinguish paralogous regions. Some duplicated CREs were sufficient to drive differential reporter activity, suggesting they may contribute to divergent cis-regulation of paralogous genes. This work provides evidence that cis-regulatory divergence contributes to novel expression patterns of recent gene duplicates in humans.
Collapse
Affiliation(s)
- Colin J Shew
- Genome Center, University of California Davis, CA, USA.,Integrative Genetics and Genomics Graduate Group, University of California Davis, CA, USA
| | - Paulina Carmona-Mora
- Genome Center, University of California Davis, CA, USA.,MIND Institute, University of California, Davis, CA, USA.,Autism Research Training Program, University of California, Davis, CA, USA
| | - Daniela C Soto
- Genome Center, University of California Davis, CA, USA.,Integrative Genetics and Genomics Graduate Group, University of California Davis, CA, USA
| | - Mira Mastoras
- Genome Center, University of California Davis, CA, USA
| | | | - Joseph Rosas
- Genome Center, University of California Davis, CA, USA.,Postbaccalaureate Research Education Program, University of California, Davis, CA, USA
| | | | - Gulhan Kaya
- Genome Center, University of California Davis, CA, USA
| | | | - Megan Y Dennis
- Genome Center, University of California Davis, CA, USA.,Integrative Genetics and Genomics Graduate Group, University of California Davis, CA, USA.,MIND Institute, University of California, Davis, CA, USA.,Autism Research Training Program, University of California, Davis, CA, USA.,Postbaccalaureate Research Education Program, University of California, Davis, CA, USA.,Department of Biochemistry & Molecular Medicine, University of California, Davis, CA, USA
| |
Collapse
|
13
|
Albrecht S, Sprang M, Andrade-Navarro MA, Fontaine JF. seqQscorer: automated quality control of next-generation sequencing data using machine learning. Genome Biol 2021; 22:75. [PMID: 33673854 PMCID: PMC7934511 DOI: 10.1186/s13059-021-02294-2] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/18/2020] [Accepted: 02/10/2021] [Indexed: 01/03/2023] Open
Abstract
Controlling quality of next-generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterize common NGS quality features and develop a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal and external functional genomics datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at https://github.com/salbrec/seqQscorer.
Collapse
Affiliation(s)
- Steffen Albrecht
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Maximilian Sprang
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Miguel A Andrade-Navarro
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany
| | - Jean-Fred Fontaine
- Johannes Gutenberg-Universität Mainz, Biozentrum I, Hans-Dieter-Hüsch-Weg 15, 55128, Mainz, Germany.
| |
Collapse
|
14
|
Wang R, Wang Y, Zhang X, Zhang Y, Du X, Fang Y, Li G. Hierarchical cooperation of transcription factors from integration analysis of DNA sequences, ChIP-Seq and ChIA-PET data. BMC Genomics 2019; 20:296. [PMID: 32039697 PMCID: PMC7226942 DOI: 10.1186/s12864-019-5535-2] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023] Open
Abstract
Background Chromosomal architecture, which is constituted by chromatin loops, plays an important role in cellular functions. Gene expression and cell identity can be regulated by the chromatin loop, which is formed by proximal or distal enhancers and promoters in linear DNA (1D). Enhancers and promoters are fundamental non-coding elements enriched with transcription factors (TFs) to form chromatin loops. However, the specific cooperation of TFs involved in forming chromatin loops is not fully understood. Results Here, we proposed a method for investigating the cooperation of TFs in four cell lines by the integrative analysis of DNA sequences, ChIP-Seq and ChIA-PET data. Results demonstrate that the interaction of enhancers and promoters is a hierarchical and dynamic complex process with cooperative interactions of different TFs synergistically regulating gene expression and chromatin structure. The TF cooperation involved in maintaining and regulating the chromatin loop of cells can be regulated by epigenetic factors, such as other TFs and DNA methylation. Conclusions Such cooperation among TFs provides the potential features that can affect chromatin’s 3D architecture in cells. The regulation of chromatin 3D organization and gene expression is a complex process associated with the hierarchical and dynamic prosperities of TFs. Electronic supplementary material The online version of this article (10.1186/s12864-019-5535-2) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Ruimin Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yunlong Wang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xueying Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Yaliang Zhang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China
| | - Xiaoyong Du
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China.,Huazhong Agricultural University, Wuhan, 430070, China
| | - Yaping Fang
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Guoliang Li
- Agricultural Bioinformatics Key Laboratory of Hubei Province, Wuhan, 430070, China. .,Huazhong Agricultural University, Wuhan, 430070, China. .,College of Informatics, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
15
|
Kanduri C, Bock C, Gundersen S, Hovig E, Sandve GK. Colocalization analyses of genomic elements: approaches, recommendations and challenges. Bioinformatics 2019; 35:1615-1624. [PMID: 30307532 PMCID: PMC6499241 DOI: 10.1093/bioinformatics/bty835] [Citation(s) in RCA: 34] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2018] [Revised: 09/03/2018] [Accepted: 10/10/2018] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Many high-throughput methods produce sets of genomic regions as one of their main outputs. Scientists often use genomic colocalization analysis to interpret such region sets, for example to identify interesting enrichments and to understand the interplay between the underlying biological processes. Although widely used, there is little standardization in how these analyses are performed. Different practices can substantially affect the conclusions of colocalization analyses. RESULTS Here, we describe the different approaches and provide recommendations for performing genomic colocalization analysis, while also discussing common methodological challenges that may influence the conclusions. As illustrated by concrete example cases, careful attention to analysis details is needed in order to meet these challenges and to obtain a robust and biologically meaningful interpretation of genomic region set data. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway
- K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, Vienna, Austria
- Max Planck Institute for Informatics, Saarbrücken, Germany
| | - Sveinung Gundersen
- Department of Informatics, University of Oslo, Oslo, Norway
- Elixir Norway, Oslo Node, University of Oslo, Oslo, Norway
| | - Eivind Hovig
- Department of Informatics, University of Oslo, Oslo, Norway
- Elixir Norway, Oslo Node, University of Oslo, Oslo, Norway
- Department of Tumor Biology, Institute for Cancer Research, Oslo, Norway
- Institute for Cancer Genetics and Informatics, The Norwegian Radium Hospital, Oslo, Norway, UK
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, Norway
- K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| |
Collapse
|
16
|
Vandel J, Cassan O, Lèbre S, Lecellier CH, Bréhélin L. Probing transcription factor combinatorics in different promoter classes and in enhancers. BMC Genomics 2019; 20:103. [PMID: 30709337 PMCID: PMC6359851 DOI: 10.1186/s12864-018-5408-0] [Citation(s) in RCA: 22] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2018] [Accepted: 12/26/2018] [Indexed: 12/31/2022] Open
Abstract
BACKGROUND In eukaryotic cells, transcription factors (TFs) are thought to act in a combinatorial way, by competing and collaborating to regulate common target genes. However, several questions remain regarding the conservation of these combinations among different gene classes, regulatory regions and cell types. RESULTS We propose a new approach named TFcoop to infer the TF combinations involved in the binding of a target TF in a particular cell type. TFcoop aims to predict the binding sites of the target TF upon the nucleotide content of the sequences and of the binding affinity of all identified cooperating TFs. The set of cooperating TFs and model parameters are learned from ChIP-seq data of the target TF. We used TFcoop to investigate the TF combinations involved in the binding of 106 TFs on 41 cell types and in four regulatory regions: promoters of mRNAs, lncRNAs and pri-miRNAs, and enhancers. We first assess that TFcoop is accurate and outperforms simple PWM methods for predicting TF binding sites. Next, analysis of the learned models sheds light on important properties of TF combinations in different promoter classes and in enhancers. First, we show that combinations governing TF binding on enhancers are more cell-type specific than that governing binding in promoters. Second, for a given TF and cell type, we observe that TF combinations are different between promoters and enhancers, but similar for promoters of mRNAs, lncRNAs and pri-miRNAs. Analysis of the TFs cooperating with the different targets show over-representation of pioneer TFs and a clear preference for TFs with binding motif composition similar to that of the target. Lastly, our models accurately distinguish promoters associated with specific biological processes. CONCLUSIONS TFcoop appears as an accurate approach for studying TF combinations. Its use on ENCODE and FANTOM data allowed us to discover important properties of human TF combinations in different promoter classes and in enhancers. The R code for learning a TFcoop model and for reproducing the main experiments described in the paper is available in an R Markdown file at address https://gite.lirmm.fr/brehelin/TFcoop .
Collapse
Affiliation(s)
- Jimmy Vandel
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
- IBC, CNRS, Univ. Montpellier, Montpellier, France
| | - Océane Cassan
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France
- IBC, CNRS, Univ. Montpellier, Montpellier, France
| | - Sophie Lèbre
- IBC, CNRS, Univ. Montpellier, Montpellier, France
- IMAG, Univ. Montpellier, CNRS, Montpellier, France
- Univ. Paul Valery Montpellier, Montpellier, France
| | - Charles-Henri Lecellier
- IBC, CNRS, Univ. Montpellier, Montpellier, France.
- Institut de Génétique Moléculaire de Montpellier, University of Montpellier, CNRS, Montpellier, France.
| | - Laurent Bréhélin
- LIRMM, Univ. Montpellier, CNRS, Montpellier, France.
- IBC, CNRS, Univ. Montpellier, Montpellier, France.
| |
Collapse
|
17
|
Domanska D, Kanduri C, Simovski B, Sandve GK. Mind the gaps: overlooking inaccessible regions confounds statistical testing in genome analysis. BMC Bioinformatics 2018; 19:481. [PMID: 30547739 PMCID: PMC6293655 DOI: 10.1186/s12859-018-2438-1] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2018] [Accepted: 10/15/2018] [Indexed: 01/21/2023] Open
Abstract
Background The current versions of reference genome assemblies still contain gaps represented by stretches of Ns. Since high throughput sequencing reads cannot be mapped to those gap regions, the regions are depleted of experimental data. Moreover, several technology platforms assay a targeted portion of the genomic sequence, meaning that regions from the unassayed portion of the genomic sequence cannot be detected in those experiments. We here refer to all such regions as inaccessible regions, and hypothesize that ignoring these regions in the null model may increase false findings in statistical testing of colocalization of genomic features. Results Our explorative analyses confirm that the genomic regions in public genomic tracks intersect very little with assembly gaps of human reference genomes (hg19 and hg38). The little intersection was observed only at the beginning and end portions of the gap regions. Further, we simulated a set of synthetic tracks by matching the properties of real genomic tracks in a way that nullified any true association between them. This allowed us to test our hypothesis that not avoiding inaccessible regions (as represented by assembly gaps) in the null model would result in spurious inflation of statistical significance. We contrasted the distributions of test statistics and p-values of Monte Carlo-based permutation tests that either avoided or did not avoid assembly gaps in the null model when testing colocalization between a pair of tracks. We observed that the statistical tests that did not account for assembly gaps in the null model resulted in a distribution of the test statistic that is shifted to the right and a distribution of p-values that is shifted to the left (indicating inflated significance). We observed a similar level of inflated significance in hg19 and hg38, despite assembly gaps covering a smaller proportion of the latter reference genome. Conclusion We provide empirical evidence demonstrating that inaccessible regions, even when covering only a few percentages of the genome, can lead to a substantial amount of false findings if not accounted for in statistical colocalization analysis. Electronic supplementary material The online version of this article (10.1186/s12859-018-2438-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Diana Domanska
- Department of Informatics, University of Oslo, Oslo, Norway.
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Oslo, Norway.,K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| | - Boris Simovski
- Department of Informatics, University of Oslo, Oslo, Norway
| | - Geir Kjetil Sandve
- Department of Informatics, University of Oslo, Oslo, Norway.,K. G. Jebsen Coeliac Disease Research Centre, Oslo, Norway
| |
Collapse
|
18
|
Ivanov MP, Ladurner R, Poser I, Beveridge R, Rampler E, Hudecz O, Novatchkova M, Hériché JK, Wutz G, van der Lelij P, Kreidl E, Hutchins JR, Axelsson-Ekker H, Ellenberg J, Hyman AA, Mechtler K, Peters JM. The replicative helicase MCM recruits cohesin acetyltransferase ESCO2 to mediate centromeric sister chromatid cohesion. EMBO J 2018; 37:e97150. [PMID: 29930102 PMCID: PMC6068434 DOI: 10.15252/embj.201797150] [Citation(s) in RCA: 38] [Impact Index Per Article: 6.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2017] [Revised: 02/27/2018] [Accepted: 04/09/2018] [Indexed: 11/09/2022] Open
Abstract
Chromosome segregation depends on sister chromatid cohesion which is established by cohesin during DNA replication. Cohesive cohesin complexes become acetylated to prevent their precocious release by WAPL before cells have reached mitosis. To obtain insight into how DNA replication, cohesion establishment and cohesin acetylation are coordinated, we analysed the interaction partners of 55 human proteins implicated in these processes by mass spectrometry. This proteomic screen revealed that on chromatin the cohesin acetyltransferase ESCO2 associates with the MCM2-7 subcomplex of the replicative Cdc45-MCM-GINS helicase. The analysis of ESCO2 mutants defective in MCM binding indicates that these interactions are required for proper recruitment of ESCO2 to chromatin, cohesin acetylation during DNA replication, and centromeric cohesion. We propose that MCM binding enables ESCO2 to travel with replisomes to acetylate cohesive cohesin complexes in the vicinity of replication forks so that these complexes can be protected from precocious release by WAPL Our results also indicate that ESCO1 and ESCO2 have distinct functions in maintaining cohesion between chromosome arms and centromeres, respectively.
Collapse
Affiliation(s)
| | - Rene Ladurner
- Research Institute of Molecular Pathology, Vienna, Austria
| | - Ina Poser
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | | | - Evelyn Rampler
- Research Institute of Molecular Pathology, Vienna, Austria
| | - Otto Hudecz
- Institute of Molecular Biotechnology, Vienna, Austria
| | | | | | - Gordana Wutz
- Research Institute of Molecular Pathology, Vienna, Austria
| | | | - Emanuel Kreidl
- Research Institute of Molecular Pathology, Vienna, Austria
| | | | | | - Jan Ellenberg
- European Molecular Biology Laboratory, Heidelberg, Germany
| | - Anthony A Hyman
- Max Planck Institute of Molecular Cell Biology and Genetics, Dresden, Germany
| | - Karl Mechtler
- Research Institute of Molecular Pathology, Vienna, Austria
- Institute of Molecular Biotechnology, Vienna, Austria
| | | |
Collapse
|
19
|
Stavrovskaya ED, Niranjan T, Fertig EJ, Wheelan SJ, Favorov AV, Mironov AA. StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data. Bioinformatics 2018; 33:3158-3165. [PMID: 29028265 DOI: 10.1093/bioinformatics/btx379] [Citation(s) in RCA: 18] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2016] [Accepted: 06/12/2017] [Indexed: 12/13/2022] Open
Abstract
Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. Contact favorov@sensi.org. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Elena D Stavrovskaya
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| | - Tejasvi Niranjan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Elana J Fertig
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Sarah J Wheelan
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
| | - Alexander V Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA.,Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, RAS, Moscow 119333, Russia.,Laboratory of Bioinformatics, Research Institute of Genetics and Selection of Industrial Microorganisms, Moscow 117545, Russia
| | - Andrey A Mironov
- Department of Bioengineering and Bioinformatics, Moscow State University, Moscow 119992, Russia.,Institute for Information Transmission Problems, RAS, Moscow 127994, Russia
| |
Collapse
|
20
|
Simovski B, Kanduri C, Gundersen S, Titov D, Domanska D, Bock C, Bossini-Castillo L, Chikina M, Favorov A, Layer RM, Mironov AA, Quinlan AR, Sheffield NC, Trynka G, Sandve GK. Coloc-stats: a unified web interface to perform colocalization analysis of genomic features. Nucleic Acids Res 2018; 46:W186-W193. [PMID: 29873782 PMCID: PMC6030976 DOI: 10.1093/nar/gky474] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/05/2018] [Revised: 05/05/2018] [Accepted: 05/15/2018] [Indexed: 12/16/2022] Open
Abstract
Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.
Collapse
Affiliation(s)
- Boris Simovski
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Chakravarthi Kanduri
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- K. G. Jebsen Centre for Coeliac Disease Research, Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway
| | - Sveinung Gundersen
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- Elixir Norway - Oslo node, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Dmytro Titov
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- Elixir Norway - Oslo node, Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Diana Domanska
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
| | - Christoph Bock
- CeMM Research Center for Molecular Medicine of the Austrian Academy of Sciences, 1090 Vienna, Austria
- Department of Laboratory Medicine, Medical University of Vienna, 1090 Vienna, Austria
- Max Planck Institute for Informatics, 66123 Saarbrücken, Germany
| | | | - Maria Chikina
- University of Pittsburgh School of Medicine, 3550 Terrace Street, Pittsburgh, PA 15213, USA
| | - Alexander Favorov
- Department of Oncology, Sidney Kimmel Comprehensive Cancer Center, The Johns Hopkins University School of Medicine, 550 N Broadway, Baltimore, MD 21205, USA
- Laboratory of Systems Biology and Computational Genetics, Vavilov Institute of General Genetics, Gubkina Street 3, Moscow 119333, Russia
| | - Ryan M Layer
- Department of Human Genetics, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
- USTAR Center for Genetic Discovery, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
| | - Andrey A Mironov
- Faculty of Bioengineering and Bioinformatics, Lomonosov Moscow State University, Lab. Bldg B, Vorobiovy Gory 1-73, Moscow 119992, Russia
- Skolkovo Institute of Science and Technology, Nobelya ul. 3, Moscow 121205, Russia
- Institute for Information Transmission Problems, Russian Academy of Sciences, Bolshoi Karenty per. 19, Moscow 127994, Russia
| | - Aaron R Quinlan
- Department of Human Genetics, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
- USTAR Center for Genetic Discovery, University of Utah, 15 N 2030 E, Salt Lake City, UT 84112, USA
- Department of Biomedical Informatics, University of Utah, 421 Wakara Way, Salt Lake City, UT 84108, USA
| | - Nathan C Sheffield
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA 22903 USA
| | - Gosia Trynka
- Cellular Genetics Programme, Wellcome Sanger Institute, CB10 1SA Hinxton, UK
| | - Geir K Sandve
- Department of Informatics, University of Oslo, Gaustadalléen 23 B, N-0373 Oslo, Norway
- K. G. Jebsen Centre for Coeliac Disease Research, Oslo University Hospital, Sognsvannsveien 20, 0372 Oslo, Norway
| |
Collapse
|
21
|
Kudron MM, Victorsen A, Gevirtzman L, Hillier LW, Fisher WW, Vafeados D, Kirkey M, Hammonds AS, Gersch J, Ammouri H, Wall ML, Moran J, Steffen D, Szynkarek M, Seabrook-Sturgis S, Jameel N, Kadaba M, Patton J, Terrell R, Corson M, Durham TJ, Park S, Samanta S, Han M, Xu J, Yan KK, Celniker SE, White KP, Ma L, Gerstein M, Reinke V, Waterston RH. The ModERN Resource: Genome-Wide Binding Profiles for Hundreds of Drosophila and Caenorhabditis elegans Transcription Factors. Genetics 2018; 208:937-949. [PMID: 29284660 PMCID: PMC5844342 DOI: 10.1534/genetics.117.300657] [Citation(s) in RCA: 118] [Impact Index Per Article: 19.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2017] [Accepted: 12/08/2017] [Indexed: 12/22/2022] Open
Abstract
To develop a catalog of regulatory sites in two major model organisms, Drosophila melanogaster and Caenorhabditis elegans, the modERN (model organism Encyclopedia of Regulatory Networks) consortium has systematically assayed the binding sites of transcription factors (TFs). Combined with data produced by our predecessor, modENCODE (Model Organism ENCyclopedia Of DNA Elements), we now have data for 262 TFs identifying 1.23 M sites in the fly genome and 217 TFs identifying 0.67 M sites in the worm genome. Because sites from different TFs are often overlapping and tightly clustered, they fall into 91,011 and 59,150 regions in the fly and worm, respectively, and these binding sites span as little as 8.7 and 5.8 Mb in the two organisms. Clusters with large numbers of sites (so-called high occupancy target, or HOT regions) predominantly associate with broadly expressed genes, whereas clusters containing sites from just a few factors are associated with genes expressed in tissue-specific patterns. All of the strains expressing GFP-tagged TFs are available at the stock centers, and the chromatin immunoprecipitation sequencing data are available through the ENCODE Data Coordinating Center and also through a simple interface (http://epic.gs.washington.edu/modERN/) that facilitates rapid accessibility of processed data sets. These data will facilitate a vast number of scientific inquiries into the function of individual TFs in key developmental, metabolic, and defense and homeostatic regulatory pathways, as well as provide a broader perspective on how individual TFs work together in local networks and globally across the life spans of these two key model organisms.
Collapse
Affiliation(s)
- Michelle M Kudron
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Alec Victorsen
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Louis Gevirtzman
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - LaDeana W Hillier
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - William W Fisher
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Dionne Vafeados
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Matt Kirkey
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Ann S Hammonds
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Jeffery Gersch
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Haneen Ammouri
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Martha L Wall
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Jennifer Moran
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - David Steffen
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Matt Szynkarek
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Samantha Seabrook-Sturgis
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Nader Jameel
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Madhura Kadaba
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Jaeda Patton
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Robert Terrell
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Mitch Corson
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Timothy J Durham
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| | - Soo Park
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Swapna Samanta
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Mei Han
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Jinrui Xu
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
| | - Koon-Kiu Yan
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
| | - Susan E Celniker
- Division of Biological Systems and Engineering, Lawrence Berkeley National Laboratory, Berkeley, California 94720
| | - Kevin P White
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Lijia Ma
- Institute for Genomics and Systems Biology, Department of Human Genetics, University of Chicago, Illinois 60637
| | - Mark Gerstein
- Program in Computational Biology and Bioinformatics, Yale University, New Haven, Connecticut 06520
- Department of Molecular Biophysics and Biochemistry, Yale University, New Haven, Connecticut 06520
- Department of Computer Science, Yale University, New Haven, Connecticut 06520
| | - Valerie Reinke
- Department of Genetics, Yale University, New Haven, Connecticut 06520
| | - Robert H Waterston
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195
| |
Collapse
|
22
|
Ye Y, Gao L, Zhang S. Integrative Analysis of Transcription Factor Combinatorial Interactions Using a Bayesian Tensor Factorization Approach. Front Genet 2017; 8:140. [PMID: 29033978 PMCID: PMC5625019 DOI: 10.3389/fgene.2017.00140] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/20/2017] [Accepted: 09/15/2017] [Indexed: 11/13/2022] Open
Abstract
Transcription factors play a key role in transcriptional regulation of genes and determination of cellular identity through combinatorial interactions. However, current studies about combinatorial regulation is deficient due to lack of experimental data in the same cellular environment and extensive existence of data noise. Here, we adopt a Bayesian CANDECOMP/PARAFAC (CP) factorization approach (BCPF) to integrate multiple datasets in a network paradigm for determining precise TF interaction landscapes. In our first application, we apply BCPF to integrate three networks built based on diverse datasets of multiple cell lines from ENCODE respectively to predict a global and precise TF interaction network. This network gives 38 novel TF interactions with distinct biological functions. In our second application, we apply BCPF to seven types of cell type TF regulatory networks and predict seven cell lineage TF interaction networks, respectively. By further exploring the dynamics and modularity of them, we find cell lineage-specific hub TFs participate in cell type or lineage-specific regulation by interacting with non-specific TFs. Furthermore, we illustrate the biological function of hub TFs by taking those of cancer lineage and blood lineage as examples. Taken together, our integrative analysis can reveal more precise and extensive description about human TF combinatorial interactions.
Collapse
Affiliation(s)
- Yusen Ye
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Lin Gao
- School of Computer Science and Technology, Xidian University, Xi'an, China
| | - Shihua Zhang
- NCMIS, CEMS, RCSDS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China.,School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing, China
| |
Collapse
|
23
|
Ang YS, Rivas RN, Ribeiro AJS, Srivas R, Rivera J, Stone NR, Pratt K, Mohamed TMA, Fu JD, Spencer CI, Tippens ND, Li M, Narasimha A, Radzinsky E, Moon-Grady AJ, Yu H, Pruitt BL, Snyder MP, Srivastava D. Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis. Cell 2017; 167:1734-1749.e22. [PMID: 27984724 DOI: 10.1016/j.cell.2016.11.033] [Citation(s) in RCA: 163] [Impact Index Per Article: 23.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/23/2016] [Revised: 08/09/2016] [Accepted: 11/17/2016] [Indexed: 12/12/2022]
Abstract
Mutation of highly conserved residues in transcription factors may affect protein-protein or protein-DNA interactions, leading to gene network dysregulation and human disease. Human mutations in GATA4, a cardiogenic transcription factor, cause cardiac septal defects and cardiomyopathy. Here, iPS-derived cardiomyocytes from subjects with a heterozygous GATA4-G296S missense mutation showed impaired contractility, calcium handling, and metabolic activity. In human cardiomyocytes, GATA4 broadly co-occupied cardiac enhancers with TBX5, another transcription factor that causes septal defects when mutated. The GATA4-G296S mutation disrupted TBX5 recruitment, particularly to cardiac super-enhancers, concomitant with dysregulation of genes related to the phenotypic abnormalities, including cardiac septation. Conversely, the GATA4-G296S mutation led to failure of GATA4 and TBX5-mediated repression at non-cardiac genes and enhanced open chromatin states at endothelial/endocardial promoters. These results reveal how disease-causing missense mutations can disrupt transcriptional cooperativity, leading to aberrant chromatin states and cellular dysfunction, including those related to morphogenetic defects.
Collapse
Affiliation(s)
- Yen-Sin Ang
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Renee N Rivas
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | | | - Rohith Srivas
- Department of Genetics and Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA 94305, USA
| | - Janell Rivera
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA
| | - Nicole R Stone
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Karishma Pratt
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA
| | - Tamer M A Mohamed
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Ji-Dong Fu
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA
| | - C Ian Spencer
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA
| | - Nathaniel D Tippens
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Molong Li
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA
| | - Anil Narasimha
- Department of Genetics and Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA 94305, USA
| | - Ethan Radzinsky
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA
| | - Anita J Moon-Grady
- Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA
| | - Haiyuan Yu
- Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, NY 14850, USA
| | - Beth L Pruitt
- Department of Mechanical Engineering, Stanford University, Stanford, CA 94305, USA
| | - Michael P Snyder
- Department of Genetics and Center for Genomics and Personalized Medicine, Stanford University, Stanford, CA 94305, USA
| | - Deepak Srivastava
- Gladstone Institute of Cardiovascular Disease and Roddenberry Center for Stem Cell Biology and Medicine, San Francisco, CA 94158, USA; Department of Pediatrics, University of California, San Francisco, San Francisco, CA 94143, USA; Department of Biochemistry and Biophysics, University of California, San Francisco, San Francisco, CA 94143, USA.
| |
Collapse
|
24
|
Tang B. Genomic feature extraction and comparison based on global alignment of ChIP-sequencing data. Bioengineered 2017; 8:248-255. [PMID: 27690208 PMCID: PMC5470523 DOI: 10.1080/21655979.2016.1226714] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022] Open
Abstract
Enhanced accuracy and high-throughput capability in capturing genetic activities lead ChIP-sequencing technology to be applied prevalently in diverse study for tackling DNA-protein interaction problems. Till now, such questions as deciding suitable ChIP-seq arguments and comparing sample quality still haunt biologists. We propose the methods for answering such questions as deciding optimal argument pairs in global alignment of ChIP sequencing data; then we employ a modern signal processing approach to extract inherent genomic features from the global alignments of transcriptional binding activities; together with pairwise comparison from intra- and inter-sample perspectives; thus we can further determine alignment quality and decide the optimal candidate for multi-source heterogeneous high-throughput sequences. The work provides a practical approach to quantitatively compare the alignment quality for heterogeneous sequencing data, especially in determining the efficiency of transcriptional binding from replicate samples, thus it helps to exploit the potentiality of ChIP-seq for deep comprehension of inherent biological meanings from the high-throughput genomic sequences.
Collapse
Affiliation(s)
- Binhua Tang
- a Epigenetics & Function Group , College of the Internet of Things, Hohai University , Jiangsu , China.,b School of Public Health , Shanghai Jiao Tong University , Shanghai , China
| |
Collapse
|
25
|
Imbeault M, Helleboid PY, Trono D. KRAB zinc-finger proteins contribute to the evolution of gene regulatory networks. Nature 2017; 543:550-554. [PMID: 28273063 DOI: 10.1038/nature21683] [Citation(s) in RCA: 338] [Impact Index Per Article: 48.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2016] [Accepted: 02/02/2017] [Indexed: 12/29/2022]
Abstract
The human genome encodes some 350 Krüppel-associated box (KRAB) domain-containing zinc-finger proteins (KZFPs), the products of a rapidly evolving gene family that has been traced back to early tetrapods. The function of most KZFPs is unknown, but a few have been demonstrated to repress transposable elements in embryonic stem (ES) cells by recruiting the transcriptional regulator TRIM28 and associated mediators of histone H3 Lys9 trimethylation (H3K9me3)-dependent heterochromatin formation and DNA methylation. Depletion of TRIM28 in human or mouse ES cells triggers the upregulation of a broad range of transposable elements, and recent data based on a few specific examples have pointed to an arms race between hosts and transposable elements as an important driver of KZFP gene selection. Here, to obtain a global view of this phenomenon, we combined phylogenetic and genomic studies to investigate the evolutionary emergence of KZFP genes in vertebrates and to identify their targets in the human genome. First, we unexpectedly reassigned the root of the family to a common ancestor of coelacanths and tetrapods. Second, although we confirmed that the majority of KZFPs bind transposable elements and pinpoint cases of ongoing co-evolution, we found that most of their transposable element targets have lost all transposition potential. Third, by examining the interplay between human KZFPs and other transcriptional modulators, we obtained evidence that KZFPs exploit evolutionarily conserved fragments of transposable elements as regulatory platforms long after the arms race against these genetic invaders has ended. Together, our results demonstrate that KZFPs partner with transposable elements to build a largely species-restricted layer of epigenetic regulation.
Collapse
Affiliation(s)
- Michaël Imbeault
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Pierre-Yves Helleboid
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| | - Didier Trono
- School of Life Sciences, Ecole Polytechnique Fédérale de Lausanne (EPFL), 1015 Lausanne, Switzerland
| |
Collapse
|
26
|
Kudrin RA, Mironov AA, Stavrovskaya ED. Chromatin and Polycomb: Biology and bioinformatics. Mol Biol 2017. [DOI: 10.1134/s0026893316060121] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022]
|
27
|
COPAR: A ChIP-Seq Optimal Peak Analyzer. BIOMED RESEARCH INTERNATIONAL 2017; 2017:5346793. [PMID: 28357402 PMCID: PMC5357551 DOI: 10.1155/2017/5346793] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/28/2016] [Accepted: 02/14/2017] [Indexed: 11/17/2022]
Abstract
Sequencing data quality and peak alignment efficiency of ChIP-sequencing profiles are directly related to the reliability and reproducibility of NGS experiments. Till now, there is no tool specifically designed for optimal peak alignment estimation and quality-related genomic feature extraction for ChIP-sequencing profiles. We developed open-sourced COPAR, a user-friendly package, to statistically investigate, quantify, and visualize the optimal peak alignment and inherent genomic features using ChIP-seq data from NGS experiments. It provides a versatile perspective for biologists to perform quality-check for high-throughput experiments and optimize their experiment design. The package COPAR can process mapped ChIP-seq read file in BED format and output statistically sound results for multiple high-throughput experiments. Together with three public ChIP-seq data sets verified with the developed package, we have deposited COPAR on GitHub under a GNU GPL license.
Collapse
|
28
|
Pourkarimi E, Bellush JM, Whitehouse I. Spatiotemporal coupling and decoupling of gene transcription with DNA replication origins during embryogenesis in C. elegans. eLife 2016; 5:21728. [PMID: 28009254 PMCID: PMC5222557 DOI: 10.7554/elife.21728] [Citation(s) in RCA: 49] [Impact Index Per Article: 6.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2016] [Accepted: 12/22/2016] [Indexed: 12/13/2022] Open
Abstract
The primary task of developing embryos is genome replication, yet how DNA replication is integrated with the profound cellular changes that occur through development is largely unknown. Using an approach to map DNA replication at high resolution in C. elegans, we show that replication origins are marked with specific histone modifications that define gene enhancers. We demonstrate that the level of enhancer associated modifications scale with the efficiency at which the origin is utilized. By mapping replication origins at different developmental stages, we show that the positions and activity of origins is largely invariant through embryogenesis. Contrary to expectation, we find that replication origins are specified prior to the broad onset of zygotic transcription, yet when transcription initiates it does so in close proximity to the pre-defined replication origins. Transcription and DNA replication origins are correlated, but the association breaks down when embryonic cell division ceases. Collectively, our data indicate that replication origins are fundamental organizers and regulators of gene activity through embryonic development.
Collapse
Affiliation(s)
- Ehsan Pourkarimi
- Molecular Biology Program, Memorial Sloan Kettering Cancer Center, New York, United States
| | - James M Bellush
- Molecular Biology Program, Memorial Sloan Kettering Cancer Center, New York, United States
| | - Iestyn Whitehouse
- Molecular Biology Program, Memorial Sloan Kettering Cancer Center, New York, United States
| |
Collapse
|
29
|
Wei Y, Wu H. Measuring the spatial correlations of protein binding sites. Bioinformatics 2016; 32:1766-72. [PMID: 26861822 DOI: 10.1093/bioinformatics/btw058] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/23/2015] [Accepted: 01/25/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Understanding the interactions of different DNA binding proteins is a crucial first step toward deciphering gene regulatory mechanism. With advances of high-throughput sequencing technology such as ChIP-seq, the genome-wide binding sites of many proteins have been profiled under different biological contexts. It is of great interest to quantify the spatial correlations of the binding sites, such as their overlaps, to provide information for the interactions of proteins. Analyses of the overlapping patterns of binding sites have been widely performed, mostly based on ad hoc methods. Due to the heterogeneity and the tremendous size of the genome, such methods often lead to biased even erroneous results. RESULTS In this work, we discover a Simpson's paradox phenomenon in assessing the genome-wide spatial correlation of protein binding sites. Leveraging information from publicly available data, we propose a testing procedure for evaluating the significance of overlapping from a pair of proteins, which accounts for background artifacts and genome heterogeneity. Real data analyses demonstrate that the proposed method provide more biologically meaningful results. AVAILABILITY AND IMPLEMENTATION An R package is available at http://www.sta.cuhk.edu.hk/YWei/ChIPCor.html CONTACTS ywei@sta.cuhk.edu.hk or hao.wu@emory.edu SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Yingying Wei
- Department of Statistics, The Chinese University of Hong Kong, Shatin, NT, Hong Kong and
| | - Hao Wu
- Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA 30322, USA
| |
Collapse
|
30
|
Ladurner R, Kreidl E, Ivanov MP, Ekker H, Idarraga-Amado MH, Busslinger GA, Wutz G, Cisneros DA, Peters JM. Sororin actively maintains sister chromatid cohesion. EMBO J 2016; 35:635-53. [PMID: 26903600 PMCID: PMC4801952 DOI: 10.15252/embj.201592532] [Citation(s) in RCA: 62] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2015] [Accepted: 01/17/2016] [Indexed: 11/26/2022] Open
Abstract
Cohesion between sister chromatids is established during DNA replication but needs to be maintained to enable proper chromosome–spindle attachments in mitosis or meiosis. Cohesion is mediated by cohesin, but also depends on cohesin acetylation and sororin. Sororin contributes to cohesion by stabilizing cohesin on DNA. Sororin achieves this by inhibiting WAPL, which otherwise releases cohesin from DNA and destroys cohesion. Here we describe mouse models which enable the controlled depletion of sororin by gene deletion or auxin‐induced degradation. We show that sororin is essential for embryonic development, cohesion maintenance, and proper chromosome segregation. We further show that the acetyltransferases ESCO1 and ESCO2 are essential for stabilizing cohesin on chromatin, that their only function in this process is to acetylate cohesin's SMC3 subunit, and that DNA replication is also required for stable cohesin–chromatin interactions. Unexpectedly, we find that sororin interacts dynamically with the cohesin complexes it stabilizes. This implies that sororin recruitment to cohesin does not depend on the DNA replication machinery or process itself, but on a property that cohesin acquires during cohesion establishment.
Collapse
Affiliation(s)
- Rene Ladurner
- IMP Research Institute of Molecular Pathology, Vienna, Austria
| | - Emanuel Kreidl
- IMP Research Institute of Molecular Pathology, Vienna, Austria
| | | | - Heinz Ekker
- Campus Science Support Facilities NGS Facility, Vienna, Austria
| | | | | | - Gordana Wutz
- IMP Research Institute of Molecular Pathology, Vienna, Austria
| | | | | |
Collapse
|
31
|
Varier RA, Carrillo de Santa Pau E, van der Groep P, Lindeboom RGH, Matarese F, Mensinga A, Smits AH, Edupuganti RR, Baltissen MP, Jansen PWTC, Ter Hoeve N, van Weely DR, Poser I, van Diest PJ, Stunnenberg HG, Vermeulen M. Recruitment of the Mammalian Histone-modifying EMSY Complex to Target Genes Is Regulated by ZNF131. J Biol Chem 2016; 291:7313-24. [PMID: 26841866 DOI: 10.1074/jbc.m115.701227] [Citation(s) in RCA: 30] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2015] [Indexed: 11/06/2022] Open
Abstract
Recent work from others and us revealed interactions between the Sin3/HDAC complex, the H3K4me3 demethylase KDM5A, GATAD1, and EMSY. Here, we characterize the EMSY/KDM5A/SIN3B complex in detail by quantitative interaction proteomics and ChIP-sequencing. We identify a novel substoichiometric interactor of the complex, transcription factor ZNF131, which recruits EMSY to a large number of active, H3K4me3 marked promoters. Interestingly, using an EMSY knock-out line and subsequent rescue experiments, we show that EMSY is in most cases positively correlated with transcriptional activity of its target genes and stimulates cell proliferation. Finally, by immunohistochemical staining of primary breast tissue microarrays we find that EMSY/KDM5A/SIN3B complex subunits are frequently overexpressed in primary breast cancer cases in a correlative manner. Taken together, these data open venues for exploring the possibility that sporadic breast cancer patients with EMSY amplification might benefit from epigenetic combination therapy targeting both the KDM5A demethylase and histone deacetylases.
Collapse
Affiliation(s)
- Radhika A Varier
- From the Department of Molecular Cancer Research, University Medical Center Utrecht, Universiteitsweg 100, 3584CG Utrecht, The Netherlands
| | - Enrique Carrillo de Santa Pau
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Petra van der Groep
- Department of Pathology, University Medical Center Utrecht Heidelberglaan 100, 3584CX Utrecht, The Netherlands, and
| | - Rik G H Lindeboom
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Filomena Matarese
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Anneloes Mensinga
- From the Department of Molecular Cancer Research, University Medical Center Utrecht, Universiteitsweg 100, 3584CG Utrecht, The Netherlands
| | - Arne H Smits
- From the Department of Molecular Cancer Research, University Medical Center Utrecht, Universiteitsweg 100, 3584CG Utrecht, The Netherlands, Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Raghu Ram Edupuganti
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Marijke P Baltissen
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Pascal W T C Jansen
- From the Department of Molecular Cancer Research, University Medical Center Utrecht, Universiteitsweg 100, 3584CG Utrecht, The Netherlands, Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Natalie Ter Hoeve
- Department of Pathology, University Medical Center Utrecht Heidelberglaan 100, 3584CX Utrecht, The Netherlands, and
| | - Danny R van Weely
- From the Department of Molecular Cancer Research, University Medical Center Utrecht, Universiteitsweg 100, 3584CG Utrecht, The Netherlands
| | - Ina Poser
- Max Planck Institute of Molecular Biology and Genetics, Pfotenhauerstrasse 108, 01307 Dresden, Germany
| | - Paul J van Diest
- Department of Pathology, University Medical Center Utrecht Heidelberglaan 100, 3584CX Utrecht, The Netherlands, and
| | - Hendrik G Stunnenberg
- Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands
| | - Michiel Vermeulen
- From the Department of Molecular Cancer Research, University Medical Center Utrecht, Universiteitsweg 100, 3584CG Utrecht, The Netherlands, Department of Molecular Biology, Radboud Institute for Molecular Life Sciences, Geert Grooteplein Zuid 30, 6525GA Nijmegen, The Netherlands,
| |
Collapse
|
32
|
Liu TL, Newton L, Liu MJ, Shiu SH, Farré EM. A G-Box-Like Motif Is Necessary for Transcriptional Regulation by Circadian Pseudo-Response Regulators in Arabidopsis. PLANT PHYSIOLOGY 2016; 170:528-39. [PMID: 26586835 PMCID: PMC4704597 DOI: 10.1104/pp.15.01562] [Citation(s) in RCA: 45] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 10/05/2015] [Accepted: 11/17/2015] [Indexed: 05/18/2023]
Abstract
PSEUDO-RESPONSE REGULATORs (PRRs) play overlapping and distinct roles in maintaining circadian rhythms and regulating diverse biological processes, including the photoperiodic control of flowering, growth, and abiotic stress responses. PRRs act as transcriptional repressors and associate with chromatin via their conserved C-terminal CCT (CONSTANS, CONSTANS-like, and TIMING OF CAB EXPRESSION 1 [TOC1/PRR1]) domains by a still-poorly understood mechanism. Here, we identified genome-wide targets of PRR9 using chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) and compared them with PRR7, PRR5, and TOC1/PRR1 ChIP-seq data. We found that PRR binding sites are located within genomic regions of low nucleosome occupancy and high DNase I hypersensitivity. Moreover, conserved noncoding regions among Brassicaceae species are enriched around PRR binding sites, indicating that PRRs associate with functionally relevant cis-regulatory regions. The PRRs shared a significant number of binding regions, and our results indicate that they coordinately restrict the expression of target genes to around dawn. A G-box-like motif was overrepresented at PRR binding regions, and we showed that this motif is necessary for mediating transcriptional regulation of CIRCADIAN CLOCK ASSOCIATED 1 and PRR9 by the PRRs. Our results further our understanding of how PRRs target specific promoters and provide an extensive resource for studying circadian regulatory networks in plants.
Collapse
Affiliation(s)
- Tiffany L Liu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
| | - Linsey Newton
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
| | - Ming-Jung Liu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
| | - Shin-Han Shiu
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
| | - Eva M Farré
- Department of Plant Biology, Michigan State University, East Lansing, Michigan 48824
| |
Collapse
|
33
|
Liu L, Zhao W, Zhou X. Modeling co-occupancy of transcription factors using chromatin features. Nucleic Acids Res 2015; 44:e49. [PMID: 26590261 PMCID: PMC4797273 DOI: 10.1093/nar/gkv1281] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 11/04/2015] [Indexed: 12/11/2022] Open
Abstract
Regulation of gene expression requires both transcription factor (TFs) and epigenetic modifications, and interplays between the two types of factors have been discovered. However study of relationships between chromatin features and TF–TF co-occupancy remains limited. Here, we revealed the relationship by first illustrating distinct profile patterns of chromatin features related to different binding events, including single TF binding and TF–TF co-occupancy of 71 TFs from five human cell lines. We further implemented statistical analyses to demonstrate the relationship by accurately predicting co-occupancy genome-widely using chromatin features including DNase I hypersensitivity, 11 histone modifications (HMs) and GC content. Remarkably, our results showed that the combination of chromatin features enables accurate predictions across the five cells. For individual chromatin features, DNase I enables high and consistent predictions. H3K27ac, H3K4me 2, H3K4me3 and H3K9ac are more reliable predictors than other HMs. Although the combination of 11 HMs achieves accurate predictions, their predictive ability varies considerably when a model obtained from one cell is applied to others, indicating relationship between HMs and TF–TF co-occupancy is cell type dependent. GC content is not a reliable predictor, but the addition of GC content to any other features enhances their predictive ability. Together, our results elucidate a strong relationship between TF–TF co-occupancy and chromatin features.
Collapse
Affiliation(s)
- Liang Liu
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Weiling Zhao
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Xiaobo Zhou
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| |
Collapse
|
34
|
Seifert A, Schofield P, Barton GJ, Hay RT. Proteotoxic stress reprograms the chromatin landscape of SUMO modification. Sci Signal 2015; 8:rs7. [PMID: 26152697 DOI: 10.1126/scisignal.aaa2213] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
Abstract
The small ubiquitin-like modifier 2 (SUMO-2) is required for survival when cells are exposed to treatments that induce proteotoxic stress by causing the accumulation of misfolded proteins. Exposure of cells to heat shock or other forms of proteotoxic stress induces the conjugation of SUMO-2 to proteins in the nucleus. We investigated the chromatin landscape of SUMO-2 modifications in response to heat stress. Through chromatin immunoprecipitation assays coupled to high-throughput DNA sequencing and mRNA sequencing, we showed that in response to heat shock, SUMO-2 accumulated at nucleosome-depleted, active DNA regulatory elements, which represented binding sites for large protein complexes and were predominantly associated with active genes. However, SUMO did not act as a direct transcriptional repressor or activator of these genes during heat shock. Instead, integration of our results with published proteomics data on heat shock-induced SUMO-2 substrates supports a model in which the conjugation of SUMO-2 to proteins acts as an acute stress response that is required for the stability of protein complexes involved in gene expression and posttranscriptional modification of mRNA. We showed that the conjugation of SUMO-2 to chromatin-associated proteins is an integral component of the proteotoxic stress response, and propose that SUMO-2 fulfills its essential role in cell survival by contributing to the maintenance of protein complex homeostasis.
Collapse
Affiliation(s)
- Anne Seifert
- Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, Scotland DD1 5EH, UK
| | - Pietà Schofield
- Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, Scotland DD1 5EH, UK
| | - Geoffrey J Barton
- Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, Scotland DD1 5EH, UK
| | - Ronald T Hay
- Centre for Gene Regulation and Expression, College of Life Sciences, University of Dundee, Dundee, Scotland DD1 5EH, UK.
| |
Collapse
|
35
|
Jain D, Baldi S, Zabel A, Straub T, Becker PB. Active promoters give rise to false positive 'Phantom Peaks' in ChIP-seq experiments. Nucleic Acids Res 2015; 43:6959-68. [PMID: 26117547 PMCID: PMC4538825 DOI: 10.1093/nar/gkv637] [Citation(s) in RCA: 101] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2015] [Accepted: 06/08/2015] [Indexed: 02/07/2023] Open
Abstract
Chromatin immunoprecipitation (ChIP) is widely used to identify chromosomal binding sites. Chromatin proteins are cross-linked to their target sequences in living cells. The purified chromatin is sheared and the relevant protein is enriched by immunoprecipitation with specific antibodies. The co-purifying genomic DNA is then determined by massive parallel sequencing (ChIP-seq). We applied ChIP-seq to map the chromosomal binding sites for two ISWI-containing nucleosome remodeling factors, ACF and RSF, in Drosophila embryos. Employing several polyclonal and monoclonal antibodies directed against their signature subunits, ACF1 and RSF-1, robust profiles were obtained indicating that both remodelers co-occupied a large set of active promoters. Further validation included controls using chromatin of mutant embryos that do not express ACF1 or RSF-1. Surprisingly, the ChIP-seq profiles were unchanged, suggesting that they were not due to specific immunoprecipitation. Conservative analysis lists about 3000 chromosomal loci, mostly active promoters that are prone to non-specific enrichment in ChIP and appear as ‘Phantom Peaks’. These peaks are not obtained with pre-immune serum and are not prominent in input chromatin. Mining the modENCODE ChIP-seq profiles identifies potential Phantom Peaks in many profiles of epigenetic regulators. These profiles and other ChIP-seq data featuring prominent Phantom Peaks must be validated with chromatin from cells in which the protein of interest has been depleted.
Collapse
Affiliation(s)
- Dhawal Jain
- Biomedical Center and Center for Integrated Protein Science Munich, Ludwig-Maximilians-University, Munich, Germany
| | - Sandro Baldi
- Biomedical Center and Center for Integrated Protein Science Munich, Ludwig-Maximilians-University, Munich, Germany
| | - Angelika Zabel
- Biomedical Center and Center for Integrated Protein Science Munich, Ludwig-Maximilians-University, Munich, Germany
| | - Tobias Straub
- Biomedical Center and Center for Integrated Protein Science Munich, Ludwig-Maximilians-University, Munich, Germany
| | - Peter B Becker
- Biomedical Center and Center for Integrated Protein Science Munich, Ludwig-Maximilians-University, Munich, Germany
| |
Collapse
|
36
|
Griffon A, Barbier Q, Dalino J, van Helden J, Spicuglia S, Ballester B. Integrative analysis of public ChIP-seq experiments reveals a complex multi-cell regulatory landscape. Nucleic Acids Res 2014; 43:e27. [PMID: 25477382 PMCID: PMC4344487 DOI: 10.1093/nar/gku1280] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/21/2023] Open
Abstract
The large collections of ChIP-seq data rapidly accumulating in public data warehouses provide genome-wide binding site maps for hundreds of transcription factors (TFs). However, the extent of the regulatory occupancy space in the human genome has not yet been fully apprehended by integrating public ChIP-seq data sets and combining it with ENCODE TFs map. To enable genome-wide identification of regulatory elements we have collected, analysed and retained 395 available ChIP-seq data sets merged with ENCODE peaks covering a total of 237 TFs. This enhanced repertoire complements and refines current genome-wide occupancy maps by increasing the human genome regulatory search space by 14% compared to ENCODE alone, and also increases the complexity of the regulatory dictionary. As a direct application we used this unified binding repertoire to annotate variant enhancer loci (VELs) from H3K4me1 mark in two cancer cell lines (MCF-7, CRC) and observed enrichments of specific TFs involved in biological key functions to cancer development and proliferation. Those enrichments of TFs within VELs provide a direct annotation of non-coding regions detected in cancer genomes. Finally, full access to this catalogue is available online together with the TFs enrichment analysis tool (http://tagc.univ-mrs.fr/remap/).
Collapse
Affiliation(s)
- Aurélien Griffon
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Quentin Barbier
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Jordi Dalino
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Jacques van Helden
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Salvatore Spicuglia
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| | - Benoit Ballester
- INSERM, UMR1090 TAGC, Marseille, F-13288, France Aix-Marseille Université, UMR1090 TAGC, Marseille, F-13288, France
| |
Collapse
|
37
|
Khushi M, Clarke CL, Graham JD. Bioinformatic analysis of cis-regulatory interactions between progesterone and estrogen receptors in breast cancer. PeerJ 2014; 2:e654. [PMID: 25426335 PMCID: PMC4243336 DOI: 10.7717/peerj.654] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/21/2014] [Accepted: 10/15/2014] [Indexed: 12/27/2022] Open
Abstract
Chromatin factors interact with each other in a cell and sequence-specific manner in order to regulate transcription and a wealth of publically available datasets exists describing the genomic locations of these interactions. Our recently published BiSA (Binding Sites Analyser) database contains transcription factor binding locations and epigenetic modifications collected from published studies and provides tools to analyse stored and imported data. Using BiSA we investigated the overlapping cis-regulatory role of estrogen receptor alpha (ERα) and progesterone receptor (PR) in the T-47D breast cancer cell line. We found that ERα binding sites overlap with a subset of PR binding sites. To investigate further, we re-analysed raw data to remove any biases introduced by the use of distinct tools in the original publications. We identified 22,152 PR and 18,560 ERα binding sites (<5% false discovery rate) with 4,358 overlapping regions among the two datasets. BiSA statistical analysis revealed a non-significant overall overlap correlation between the two factors, suggesting that ERα and PR are not partner factors and do not require each other for binding to occur. However, Monte Carlo simulation by Binary Interval Search (BITS), Relevant Distance, Absolute Distance, Jaccard and Projection tests by Genometricorr revealed a statistically significant spatial correlation of binding regions on chromosome between the two factors. Motif analysis revealed that the shared binding regions were enriched with binding motifs for ERα, PR and a number of other transcription and pioneer factors. Some of these factors are known to co-locate with ERα and PR binding. Therefore spatially close proximity of ERα binding sites with PR binding sites suggests that ERα and PR, in general function independently at the molecular level, but that their activities converge on a specific subset of transcriptional targets.
Collapse
Affiliation(s)
- Matloob Khushi
- Centre for Cancer Research, Westmead Millennium Institute, Sydney Medical School-Westmead, University of Sydney , Australia
| | - Christine L Clarke
- Centre for Cancer Research, Westmead Millennium Institute, Sydney Medical School-Westmead, University of Sydney , Australia
| | - J Dinny Graham
- Centre for Cancer Research, Westmead Millennium Institute, Sydney Medical School-Westmead, University of Sydney , Australia
| |
Collapse
|
38
|
Regulatory analysis of the C. elegans genome with spatiotemporal resolution. Nature 2014; 512:400-5. [PMID: 25164749 PMCID: PMC4530805 DOI: 10.1038/nature13497] [Citation(s) in RCA: 84] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2013] [Accepted: 05/22/2014] [Indexed: 12/17/2022]
Abstract
Discovering the structure and dynamics of transcriptional regulatory events in the genome with cellular and temporal resolution is crucial to understanding the regulatory underpinnings of development and disease. We determined the genomic distribution of binding sites for 92 transcription factors (TFs) and regulatory proteins across multiple stages of C. elegans development by performing 241 ChIP-seq experiments. Integrating regulatory binding and cellular-resolution expression data yielded a spatiotemporally-resolved metazoan TF binding map. Using this map, we explore developmental regulatory circuits that encode combinatorial logic at the levels of co-binding and co-expression of TFs, characterizing (1) the genomic coverage and clustering of regulatory binding, (2) the binding preferences of and biological processes regulated by TFs, (3) the global TF co-associations and genomic subdomains that suggest shared patterns of regulation, and (4) key TFs and TF co-associations for fate specification of individual lineages and cell-types.
Collapse
|
39
|
Comparative analysis of regulatory information and circuits across distant species. Nature 2014; 512:453-6. [PMID: 25164757 PMCID: PMC4336544 DOI: 10.1038/nature13668] [Citation(s) in RCA: 132] [Impact Index Per Article: 13.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2013] [Accepted: 07/10/2014] [Indexed: 12/20/2022]
Abstract
Despite the large evolutionary distances between metazoan species, they can show remarkable commonalities in their biology, and this has helped to establish fly and worm as model organisms for human biology. Although studies of individual elements and factors have explored similarities in gene regulation, a large-scale comparative analysis of basic principles of transcriptional regulatory features is lacking. Here we map the genome-wide binding locations of 165 human, 93 worm and 52 fly transcription regulatory factors, generating a total of 1,019 data sets from diverse cell types, developmental stages, or conditions in the three species, of which 498 (48.9%) are presented here for the first time. We find that structural properties of regulatory networks are remarkably conserved and that orthologous regulatory factor families recognize similar binding motifs in vivo and show some similar co-associations. Our results suggest that gene-regulatory properties previously observed for individual factors are general principles of metazoan regulation that are remarkably well-preserved despite extensive functional divergence of individual network connections. The comparative maps of regulatory circuitry provided here will drive an improved understanding of the regulatory underpinnings of model organism biology and how these relate to human biology, development and disease.
Collapse
|
40
|
Binding sites analyser (BiSA): software for genomic binding sites archiving and overlap analysis. PLoS One 2014; 9:e87301. [PMID: 24533055 PMCID: PMC3922719 DOI: 10.1371/journal.pone.0087301] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2013] [Accepted: 12/24/2013] [Indexed: 12/19/2022] Open
Abstract
Genome-wide mapping of transcription factor binding and histone modification reveals complex patterns of interactions. Identifying overlaps in binding patterns by different factors is a major objective of genomic studies, but existing methods to archive large numbers of datasets in a personalised database lack sophistication and utility. Therefore we have developed transcription factor DNA binding site analyser software (BiSA), for archiving of binding regions and easy identification of overlap with or proximity to other regions of interest. Analysis results can be restricted by chromosome or base pair overlap between regions or maximum distance between binding peaks. BiSA is capable of reporting overlapping regions that share common base pairs; regions that are nearby; regions that are not overlapping; and average region sizes. BiSA can identify genes located near binding regions of interest, genomic features near a gene or locus of interest and statistical significance of overlapping regions can also be reported. Overlapping results can be visualized as Venn diagrams. A major strength of BiSA is that it is supported by a comprehensive database of publicly available transcription factor binding sites and histone modifications, which can be directly compared to user data. The documentation and source code are available on http://bisa.sourceforge.net
Collapse
|
41
|
Schweikert G, Cseke B, Clouaire T, Bird A, Sanguinetti G. MMDiff: quantitative testing for shape changes in ChIP-Seq data sets. BMC Genomics 2013; 14:826. [PMID: 24267901 PMCID: PMC4008153 DOI: 10.1186/1471-2164-14-826] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2013] [Accepted: 11/15/2013] [Indexed: 12/21/2022] Open
Abstract
Background Cell-specific gene expression is controlled by epigenetic modifications and transcription factor binding. While genome-wide maps for these protein-DNA interactions have become widely available, quantitative comparison of the resulting ChIP-Seq data sets remains challenging. Current approaches to detect differentially bound or modified regions are mainly borrowed from RNA-Seq data analysis, thus focusing on total counts of fragments mapped to a region, ignoring any information encoded in the shape of the peaks. Results Here, we present MMDiff, a robust, broadly applicable method for detecting differences between sequence count data sets. Based on quantifying shape changes in signal profiles, it overcomes challenges imposed by the highly structured nature of the data and the paucity of replicates. We first use a simulated data set to compare the performance of MMDiff with results obtained by four alternative methods. We demonstrate that MMDiff excels when peak profiles change between samples. We next use MMDiff to re-analyse a recent data set of the histone modification H3K4me3 elucidating the establishment of this prominent epigenomic marker. Our empirical analysis shows that the method yields reproducible results across experiments, and is able to detect functional important changes in histone modifications. To further explore the broader applicability of MMDiff, we apply it to two ENCODE data sets: one investigating the histone modification H3K27ac and one measuring the genome-wide binding of the transcription factor CTCF. In both cases, MMDiff proves to be complementary to count-based methods. In addition, we can show that MMDiff is capable of directly detecting changes of homotypic binding events at neighbouring binding sites. MMDiff is readily available as a Bioconductor package. Conclusions Our results demonstrate that higher order features of ChIP-Seq peaks carry relevant and often complementary information to total counts, and hence are important in assessing differential histone modifications and transcription factor binding. We have developed a new computational method, MMDiff, that is capable of exploring these features and therefore closes an existing gap in the analysis of ChIP-Seq data sets.
Collapse
Affiliation(s)
- Gabriele Schweikert
- School of Informatics, University of Edinburgh, 10 Crichton Street, Edinburgh EH89AB, UK.
| | | | | | | | | |
Collapse
|
42
|
Xie D, Boyle AP, Wu L, Zhai J, Kawli T, Snyder M. Dynamic trans-acting factor colocalization in human cells. Cell 2013; 155:713-24. [PMID: 24243024 DOI: 10.1016/j.cell.2013.09.043] [Citation(s) in RCA: 107] [Impact Index Per Article: 9.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2013] [Revised: 07/13/2013] [Accepted: 08/27/2013] [Indexed: 01/02/2023]
Abstract
Different trans-acting factors (TFs) collaborate and act in concert at distinct loci to perform accurate regulation of their target genes. To date, the cobinding of TF pairs has been investigated in a limited context both in terms of the number of factors within a cell type and across cell types and the extent of combinatorial colocalizations. Here, we use an approach to analyze TF colocalization within a cell type and across multiple cell lines at an unprecedented level. We extend this approach with large-scale mass spectrometry analysis of immunoprecipitations of 50 TFs. Our combined approach reveals large numbers of interesting TF-TF associations. We observe extensive change in TF colocalizations both within a cell type exposed to different conditions and across multiple cell types. We show distinct functional annotations and properties of different TF cobinding patterns and provide insights into the complex regulatory landscape of the cell.
Collapse
Affiliation(s)
- Dan Xie
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305, USA
| | | | | | | | | | | |
Collapse
|
43
|
De S, Pedersen BS, Kechris K. The dilemma of choosing the ideal permutation strategy while estimating statistical significance of genome-wide enrichment. Brief Bioinform 2013; 15:919-28. [PMID: 23956260 DOI: 10.1093/bib/bbt053] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
Integrative analyses of genomic, epigenomic and transcriptomic features for human and various model organisms have revealed that many such features are nonrandomly distributed in the genome. Significant enrichment (or depletion) of genomic features is anticipated to be biologically important. Detection of genomic regions having enrichment of certain features and estimation of corresponding statistical significance rely on the expected null distribution generated by a permutation model. We discuss different genome-wide permutation approaches, present examples where the permutation strategy affects the null model and show that the confidence in estimating statistical significance of genome-wide enrichment might depend on the choice of the permutation approach. In those cases, where biologically relevant constraints are unclear, it is preferable to examine whether key conclusions are consistent, irrespective of the choice of the randomization strategy.
Collapse
|
44
|
Exploring the cooccurrence patterns of multiple sets of genomic intervals. BIOMED RESEARCH INTERNATIONAL 2013; 2013:617545. [PMID: 23781505 PMCID: PMC3679813 DOI: 10.1155/2013/617545] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 05/04/2013] [Indexed: 11/27/2022]
Abstract
Background. Exploring the spatial relationship of different genomic features has been of great
interest since the early days of genomic research. The relationship sometimes provides useful
information for understanding certain biological processes. Recent advances in high-throughput
technologies such as ChIP-seq produce large amount of data in the form of genomic intervals. Most of
the existing methods for assessing spatial relationships among the intervals are designed for pairwise
comparison and cannot be easily scaled up. Results. We present a statistical method and software tool to characterize the cooccurrence patterns of multiple sets of genomic intervals. The occurrences of genomic intervals are described by a simple
finite mixture model, where each component represents a distinct cooccurrence pattern. The model
parameters are estimated via an EM algorithm and can be viewed as sufficient statistics of the
cooccurrence patterns. Simulation and real data results show that the model can accurately capture
the patterns and provide biologically meaningful results. The method is implemented in a freely
available R package giClust. Conclusions. The method and the software provide a convenient way for biologists to explore the
cooccurrence patterns among a relatively large number of sets of genomic intervals.
Collapse
|
45
|
Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA. Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate. PLoS One 2013; 8:e56810. [PMID: 23468881 PMCID: PMC3585227 DOI: 10.1371/journal.pone.0056810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 01/14/2013] [Indexed: 01/25/2023] Open
Abstract
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.
Collapse
Affiliation(s)
- Karen G. Dowell
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Allen K. Simons
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Zack Z. Wang
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Kyuson Yun
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Matthew A. Hibbs
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Trinity University, Department of Computer Science, San Antonio, Texas, United States of America
- * E-mail:
| |
Collapse
|
46
|
Watanabe H, Francis JM, Woo MS, Etemad B, Lin W, Fries DF, Peng S, Snyder EL, Tata PR, Izzo F, Schinzel AC, Cho J, Hammerman PS, Verhaak RG, Hahn WC, Rajagopal J, Jacks T, Meyerson M. Integrated cistromic and expression analysis of amplified NKX2-1 in lung adenocarcinoma identifies LMO3 as a functional transcriptional target. Genes Dev 2013; 27:197-210. [PMID: 23322301 DOI: 10.1101/gad.203208.112] [Citation(s) in RCA: 51] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Abstract
The NKX2-1 transcription factor, a regulator of normal lung development, is the most significantly amplified gene in human lung adenocarcinoma. To study the transcriptional impact of NKX2-1 amplification, we generated an expression signature associated with NKX2-1 amplification in human lung adenocarcinoma and analyzed DNA-binding sites of NKX2-1 by genome-wide chromatin immunoprecipitation. Integration of these expression and cistromic analyses identified LMO3, itself encoding a transcription regulator, as a candidate direct transcriptional target of NKX2-1. Further cistromic and overexpression analyses indicated that NKX2-1 can cooperate with the forkhead box transcription factor FOXA1 to regulate LMO3 gene expression. RNAi analysis of NKX2-1-amplified cells compared with nonamplified cells demonstrated that LMO3 mediates cell survival downstream from NKX2-1. Our findings provide new insight into the transcriptional regulatory network of NKX2-1 and suggest that LMO3 is a transcriptional signal transducer in NKX2-1-amplified lung adenocarcinomas.
Collapse
Affiliation(s)
- Hideo Watanabe
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|