1
|
Ferré Q, Capponi C, Puthier D. OLOGRAM-MODL: mining enriched n-wise combinations of genomic features with Monte Carlo and dictionary learning. NAR Genom Bioinform 2022; 3:lqab114. [PMID: 34988437 PMCID: PMC8693575 DOI: 10.1093/nargab/lqab114] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2021] [Revised: 11/08/2021] [Accepted: 11/23/2021] [Indexed: 02/06/2023] Open
Abstract
Most epigenetic marks, such as Transcriptional Regulators or histone marks, are biological objects known to work together in n-wise complexes. A suitable way to infer such functional associations between them is to study the overlaps of the corresponding genomic regions. However, the problem of the statistical significance of n-wise overlaps of genomic features is seldom tackled, which prevent rigorous studies of n-wise interactions. We introduce OLOGRAM-MODL, which considers overlaps between n ≥ 2 sets of genomic regions, and computes their statistical mutual enrichment by Monte Carlo fitting of a Negative Binomial distribution, resulting in more resolutive P-values. An optional machine learning method is proposed to find complexes of interest, using a new itemset mining algorithm based on dictionary learning which is resistant to noise inherent to biological assays. The overall approach is implemented through an easy-to-use CLI interface for workflow integration, and a visual tree-based representation of the results suited for explicability. The viability of the method is experimentally studied using both artificial and biological data. This approach is accessible through the command line interface of the pygtftk toolkit, available on Bioconda and from https://github.com/dputhier/pygtftk
Collapse
Affiliation(s)
- Quentin Ferré
- Aix Marseille Univ, INSERM, UMR U1090, TAGC, Marseille, France
| | - Cécile Capponi
- Aix Marseille Univ, CNRS, UMR 7020, LIS, Qarma, Marseille, France
| | - Denis Puthier
- Aix Marseille Univ, INSERM, UMR U1090, TAGC, Marseille, France
| |
Collapse
|
2
|
Yi X, Zheng Z, Xu H, Zhou Y, Huang D, Wang J, Feng X, Zhao K, Fan X, Zhang S, Dong X, Wang Z, Shen Y, Cheng H, Shi L, Li MJ. Interrogating cell type-specific cooperation of transcriptional regulators in 3D chromatin. iScience 2021; 24:103468. [PMID: 34888502 PMCID: PMC8634045 DOI: 10.1016/j.isci.2021.103468] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2021] [Revised: 09/23/2021] [Accepted: 11/12/2021] [Indexed: 12/14/2022] Open
Abstract
Context-specific activities of transcription regulators (TRs) in the nucleus modulate spatiotemporal gene expression precisely. Using the largest ChIP-seq data and chromatin loops in the human K562 cell line, we initially interrogated TR cooperation in 3D chromatin via a graphical model and revealed many known and novel TRs manipulating context-specific pathways. To explore TR cooperation across broad tissue/cell types, we systematically leveraged large-scale open chromatin profiles, computational footprinting, and high-resolution chromatin interactions to investigate tissue/cell type-specific TR cooperation. We first delineated a landscape of TR cooperation across 40 human tissue/cell types. Network modularity analyses uncovered the commonality and specificity of TR cooperation in different conditions. We also demonstrated that TR cooperation information can better interpret the disease-causal variants identified by genome-wide association studies and recapitulate cell states during neural development. Our study characterizes shared and unique patterns of TR cooperation associated with the cell type specificity of gene regulation in 3D chromatin. Computational inference of transcriptional regulator (TR) cooperation in 3D chromatin A landscape of 3D TR cooperation across 40 human tissue/cell types TR cooperation can better interpret the disease-causal variants identified by GWAS Cooperation of certain TRs shapes context-specific gene regulation in cell development
Collapse
Affiliation(s)
- Xianfu Yi
- School of Biomedical Engineering and Technology, Tianjin Medical University, Tianjin 300070, China.,Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Zhanye Zheng
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hang Xu
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China
| | - Yao Zhou
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Dandan Huang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Jianhua Wang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiangling Feng
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Ke Zhao
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xutong Fan
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Shijie Zhang
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Xiaobao Dong
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Genetics, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Zhao Wang
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Yujun Shen
- Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Hui Cheng
- State Key Laboratory of Experimental Hematology, Chinese Academy of Medical Sciences, Tianjin 300070, China
| | - Lei Shi
- Department of Biochemistry and Molecular Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China
| | - Mulin Jun Li
- Department of Bioinformatics, The Province and Ministry Co-sponsored Collaborative Innovation Center for Medical Epigenetics, Tianjin Medical University, Tianjin 300070, China.,Department of Pharmacology, Tianjin Key Laboratory of Inflammation Biology, School of Basic Medical Sciences, Tianjin Medical University, Tianjin 300070, China.,Department of Epidemiology and Biostatistics, Tianjin Key Laboratory of Molecular Cancer Epidemiology, National Clinical Research Center for Cancer, Tianjin Medical University Cancer Institute and Hospital, Tianjin Medical University, Tianjin 300070, China
| |
Collapse
|
3
|
Biswas A, Narlikar L. A universal framework for detecting cis-regulatory diversity in DNA regulatory regions. Genome Res 2021; 31:1646-1662. [PMID: 34285090 PMCID: PMC8415372 DOI: 10.1101/gr.274563.120] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2020] [Accepted: 07/09/2021] [Indexed: 12/02/2022]
Abstract
High-throughput sequencing-based assays measure different biochemical activities pertaining to gene regulation, genome-wide. These activities include transcription factor (TF)–DNA binding, enhancer activity, open chromatin, and more. A major goal is to understand underlying sequence components, or motifs, that can explain the measured activity. It is usually not one motif but a combination of motifs bound by cooperatively acting proteins that confers activity to such regions. Furthermore, regions can be diverse, governed by different combinations of TFs/motifs. Current approaches do not take into account this issue of combinatorial diversity. We present a new statistical framework, cisDIVERSITY, which models regions as diverse modules characterized by combinations of motifs while simultaneously learning the motifs themselves. Because cisDIVERSITY does not rely on knowledge of motifs, modules, cell type, or organism, it is general enough to be applied to regions reported by most high-throughput assays. For example, in enhancer predictions resulting from different assays—GRO-cap, STARR-seq, and those measuring chromatin structure—cisDIVERSITY discovers distinct modules and combinations of TF binding sites, some specific to the assay. From protein–DNA binding data, cisDIVERSITY identifies potential cofactors of the profiled TF, whereas from ATAC-seq data, it identifies tissue-specific regulatory modules. Finally, analysis of single-cell ATAC-seq data suggests that regions open in one cell-state encode information about future states, with certain modules staying open and others closing down in the next time point.
Collapse
Affiliation(s)
- Anushua Biswas
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| | - Leelavati Narlikar
- CSIR-National Chemical Laboratory, Academy of Scientific and Innovative Research
| |
Collapse
|
4
|
Yamada N, Rossi MJ, Farrell N, Pugh BF, Mahony S. Alignment and quantification of ChIP-exo crosslinking patterns reveal the spatial organization of protein-DNA complexes. Nucleic Acids Res 2020; 48:11215-11226. [PMID: 32747934 PMCID: PMC7672471 DOI: 10.1093/nar/gkaa618] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/09/2019] [Revised: 06/25/2020] [Accepted: 07/13/2020] [Indexed: 12/12/2022] Open
Abstract
The ChIP-exo assay precisely delineates protein-DNA crosslinking patterns by combining chromatin immunoprecipitation with 5' to 3' exonuclease digestion. Within a regulatory complex, the physical distance of a regulatory protein to DNA affects crosslinking efficiencies. Therefore, the spatial organization of a protein-DNA complex could potentially be inferred by analyzing how crosslinking signatures vary between its subunits. Here, we present a computational framework that aligns ChIP-exo crosslinking patterns from multiple proteins across a set of coordinately bound regulatory regions, and which detects and quantifies protein-DNA crosslinking events within the aligned profiles. By producing consistent measurements of protein-DNA crosslinking strengths across multiple proteins, our approach enables characterization of relative spatial organization within a regulatory complex. Applying our approach to collections of ChIP-exo data, we demonstrate that it can recover aspects of regulatory complex spatial organization at yeast ribosomal protein genes and yeast tRNA genes. We also demonstrate the ability to quantify changes in protein-DNA complex organization across conditions by applying our approach to analyze Drosophila Pol II transcriptional components. Our results suggest that principled analyses of ChIP-exo crosslinking patterns enable inference of spatial organization within protein-DNA complexes.
Collapse
Affiliation(s)
- Naomi Yamada
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Matthew J Rossi
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Nina Farrell
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - B Franklin Pugh
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| | - Shaun Mahony
- Center for Eukaryotic Gene Regulation, Department of Biochemistry & Molecular Biology, The Pennsylvania State University, University Park, PA 16802, USA
| |
Collapse
|
5
|
Ceol A, Montanari P, Bartolini I, Ceri S, Ciaccia P, Patella M, Masseroli M. Search and comparison of (epi)genomic feature patterns in multiple genome browser tracks. BMC Bioinformatics 2020; 21:464. [PMID: 33076821 PMCID: PMC7574191 DOI: 10.1186/s12859-020-03781-2] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/08/2020] [Accepted: 09/24/2020] [Indexed: 01/22/2023] Open
Abstract
Background Genome browsers are widely used for locating interesting genomic regions, but their interactive use is obviously limited to inspecting short genomic portions. An ideal interaction is to provide patterns of regions on the browser, and then extract other genomic regions over the whole genome where such patterns occur, ranked by similarity. Results We developed SimSearch, an optimized pattern-search method and an open source plugin for the Integrated Genome Browser (IGB), to find genomic region sets that are similar to a given region pattern. It provides efficient visual genome-wide analytics computation in large datasets; the plugin supports intuitive user interactions for selecting an interesting pattern on IGB tracks and visualizing the computed occurrences of similar patterns along the entire genome. SimSearch also includes functions for the annotation and enrichment of results, and is enhanced with a Quickload repository including numerous epigenomic feature datasets from ENCODE and Roadmap Epigenomics. The paper also includes some use cases to show multiple genome-wide analyses of biological interest, which can be easily performed by taking advantage of the presented approach. Conclusions The novel SimSearch method provides innovative support for effective genome-wide pattern search and visualization; its relevance and practical usefulness is demonstrated through a number of significant use cases of biological interest. The SimSearch IGB plugin, documentation, and code are freely available at https://deib-geco.github.io/simsearch-app/ and https://github.com/DEIB-GECO/simsearch-app/.
Collapse
Affiliation(s)
- Arnaud Ceol
- Center for Genomic Science of IIT@SEMM, Fondazione Istituto Italiano di Tecnologia (IIT), 20139, Milan, Italy.,IEO, European Institute of Oncology IRCCS, 20141, Milan, Italy
| | | | | | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133, Milan, Italy
| | | | | | - Marco Masseroli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133, Milan, Italy.
| |
Collapse
|
6
|
Yang X, Vingron M. Classifying human promoters by occupancy patterns identifies recurring sequence elements, combinatorial binding, and spatial interactions. BMC Biol 2018; 16:138. [PMID: 30442124 PMCID: PMC6238301 DOI: 10.1186/s12915-018-0585-5] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2018] [Accepted: 10/04/2018] [Indexed: 12/14/2022] Open
Abstract
Background Characterizing recurring sequence patterns in human promoters has been a challenging undertaking even nowadays where a near-complete overview of promoters exists. However, with the more recent availability of genomic location (ChIP-seq) data, one can approach that question through the identification of characteristic patterns of transcription factor occupancy and histone modifications. Results Based on the ENCODE annotation and integration of sequence motifs as well as three-dimensional chromatin data, we have undertaken a re-analysis of occupancy and sequence patterns in human promoters. We identify clear groups of CAAT-box and E-box sequence motif containing promoters, as well as a group of promoters whose interaction with an enhancer appears to be mediated by CCCTC-binding factor (CTCF) binding on the promoter. We also extend our analysis to inactive promoters, showing that only a surprisingly small number of inactive promoters is repressed by the polycomb complex. We also identify combinatorial patterns of transcription factor interactions indicated by the ChIP-seq signals. Conclusion Our analysis defines subgroups of promoters characterized by stereotypic patterns of transcription factor occupancy, and combinations of specific sequence patterns which are required for their binding. This grouping provides new hypotheses concerning the assembly and dynamics of transcription factor complexes at their respective promoter groups, as well as questions on the evolutionary origin of these groups. Electronic supplementary material The online version of this article (10.1186/s12915-018-0585-5) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Xinyi Yang
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Ihnestraße 63-73, Berlin, 14195, Germany
| | - Martin Vingron
- Max Planck Institute for Molecular Genetics, 14195 Berlin, Germany, Ihnestraße 63-73, Berlin, 14195, Germany.
| |
Collapse
|
7
|
Ng FSL, Ruau D, Wernisch L, Göttgens B. A graphical model approach visualizes regulatory relationships between genome-wide transcription factor binding profiles. Brief Bioinform 2018; 19:162-173. [PMID: 27780826 PMCID: PMC5496675 DOI: 10.1093/bib/bbw102] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2015] [Indexed: 11/16/2022] Open
Abstract
Integrated analysis of multiple genome-wide transcription factor (TF)-binding profiles will be vital to advance our understanding of the global impact of TF binding. However, existing methods for measuring similarity in large numbers of chromatin immunoprecipitation assays with sequencing (ChIP-seq), such as correlation, mutual information or enrichment analysis, are limited in their ability to display functionally relevant TF relationships. In this study, we propose the use of graphical models to determine conditional independence between TFs and showed that network visualization provides a promising alternative to distinguish ‘direct’ versus ‘indirect’ TF interactions. We applied four algorithms to measure ‘direct’ dependence to a compendium of 367 mouse haematopoietic TF ChIP-seq samples and obtained a consensus network known as a ‘TF association network’ where edges in the network corresponded to likely causal pairwise relationships between TFs. The ‘TF association network’ illustrates the role of TFs in developmental pathways, is reminiscent of combinatorial TF regulation, corresponds to known protein–protein interactions and indicates substantial TF-binding reorganization in leukemic cell types. With the rapid increase in TF ChIP-Seq data sets, the approach presented here will be a powerful tool to study transcriptional programmes across a wide range of biological systems.
Collapse
Affiliation(s)
- Felicia S L Ng
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - David Ruau
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Lorenz Wernisch
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
| | - Berthold Göttgens
- Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge, UK
- Corresponding author: Berthold Gottgens, Department of Haematology, Wellcome Trust and MRC Cambridge Stem Cell Institute & Cambridge Institute for Medical Research, Hills Road, Cambridge CB2 0XY, UK. Tel: 01223-336829; Fax: 01223-762670; E-mail:
| |
Collapse
|
8
|
Perna S, Pinoli P, Ceri S, Wong L. TICA: Transcriptional Interaction and Coregulation Analyzer. GENOMICS, PROTEOMICS & BIOINFORMATICS 2018; 16:342-353. [PMID: 30578913 PMCID: PMC6364043 DOI: 10.1016/j.gpb.2018.05.004] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/29/2017] [Revised: 05/11/2018] [Accepted: 05/18/2018] [Indexed: 11/16/2022]
Abstract
Transcriptional regulation is critical to cellular processes of all organisms. Regulatory mechanisms often involve more than one transcription factor (TF) from different families, binding together and attaching to the DNA as a single complex. However, only a fraction of the regulatory partners of each TF is currently known. In this paper, we present the Transcriptional Interaction and Coregulation Analyzer (TICA), a novel methodology for predicting heterotypic physical interaction of TFs. TICA employs a data-driven approach to infer interaction phenomena from chromatin immunoprecipitation and sequencing (ChIP-seq) data. Its prediction rules are based on the distribution of minimal distance couples of paired binding sites belonging to different TFs which are located closest to each other in promoter regions. Notably, TICA uses only binding site information from input ChIP-seq experiments, bypassing the need to do motif calling on sequencing data. We present our method and test it on ENCODE ChIP-seq datasets, using three cell lines as reference including HepG2, GM12878, and K562. TICA positive predictions on ENCODE ChIP-seq data are strongly enriched when compared to protein complex (CORUM) and functional interaction (BioGRID) databases. We also compare TICA against both motif/ChIP-seq based methods for physical TF-TF interaction prediction and published literature. Based on our results, TICA offers significant specificity (average 0.902) while maintaining a good recall (average 0.284) with respect to CORUM, providing a novel technique for fast analysis of regulatory effect in cell lines. Furthermore, predictions by TICA are complementary to other methods for TF-TF interaction prediction (in particular, TACO and CENTDIST). Thus, combined application of these prediction tools results in much improved sensitivity in detecting TF-TF interactions compared to TICA alone (sensitivity of 0.526 when combining TICA with TACO and 0.585 when combining with CENTDIST) with little compromise in specificity (specificity 0.760 when combining with TACO and 0.643 with CENTDIST). TICA is publicly available at http://geco.deib.polimi.it/tica/.
Collapse
Affiliation(s)
- Stefano Perna
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy.
| | - Pietro Pinoli
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Stefano Ceri
- Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano, 20133 Milan, Italy
| | - Limsoon Wong
- School of Computing, National University of Singapore, Singapore 117417, Singapore
| |
Collapse
|
9
|
Li YE, Xiao M, Shi B, Yang YCT, Wang D, Wang F, Marcia M, Lu ZJ. Identification of high-confidence RNA regulatory elements by combinatorial classification of RNA-protein binding sites. Genome Biol 2017; 18:169. [PMID: 28886744 PMCID: PMC5591525 DOI: 10.1186/s13059-017-1298-8] [Citation(s) in RCA: 32] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2017] [Accepted: 08/14/2017] [Indexed: 12/20/2022] Open
Abstract
Crosslinking immunoprecipitation sequencing (CLIP-seq) technologies have enabled researchers to characterize transcriptome-wide binding sites of RNA-binding protein (RBP) with high resolution. We apply a soft-clustering method, RBPgroup, to various CLIP-seq datasets to group together RBPs that specifically bind the same RNA sites. Such combinatorial clustering of RBPs helps interpret CLIP-seq data and suggests functional RNA regulatory elements. Furthermore, we validate two RBP–RBP interactions in cell lines. Our approach links proteins and RNA motifs known to possess similar biochemical and cellular properties and can, when used in conjunction with additional experimental data, identify high-confidence RBP groups and their associated RNA regulatory elements.
Collapse
Affiliation(s)
- Yang Eric Li
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Mu Xiao
- Life Sciences Institute, Innovation Center for Cell Signaling Network, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Binbin Shi
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Yu-Cheng T Yang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Dong Wang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China
| | - Fei Wang
- Life Sciences Institute, Innovation Center for Cell Signaling Network, Zhejiang University, Hangzhou, Zhejiang, 310058, China
| | - Marco Marcia
- European Molecular Biology Laboratory, Grenoble Outstation, 71 Avenue des Martyrs, Grenoble, 38042, France
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing, 100084, China.
| |
Collapse
|
10
|
Lu R, Mucaki EJ, Rogan PK. Discovery and validation of information theory-based transcription factor and cofactor binding site motifs. Nucleic Acids Res 2017; 45:e27. [PMID: 27899659 PMCID: PMC5389469 DOI: 10.1093/nar/gkw1036] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2016] [Accepted: 10/19/2016] [Indexed: 02/06/2023] Open
Abstract
Data from ChIP-seq experiments can derive the genome-wide binding specificities of transcription factors (TFs) and other regulatory proteins. We analyzed 765 ENCODE ChIP-seq peak datasets of 207 human TFs with a novel motif discovery pipeline based on recursive, thresholded entropy minimization. This approach, while obviating the need to compensate for skewed nucleotide composition, distinguishes true binding motifs from noise, quantifies the strengths of individual binding sites based on computed affinity and detects adjacent cofactor binding sites that coordinate with the targets of primary, immunoprecipitated TFs. We obtained contiguous and bipartite information theory-based position weight matrices (iPWMs) for 93 sequence-specific TFs, discovered 23 cofactor motifs for 127 TFs and revealed six high-confidence novel motifs. The reliability and accuracy of these iPWMs were determined via four independent validation methods, including the detection of experimentally proven binding sites, explanation of effects of characterized SNPs, comparison with previously published motifs and statistical analyses. We also predict previously unreported TF coregulatory interactions (e.g. TF complexes). These iPWMs constitute a powerful tool for predicting the effects of sequence variants in known binding sites, performing mutation analysis on regulatory SNPs and predicting previously unrecognized binding sites and target genes.
Collapse
Affiliation(s)
- Ruipeng Lu
- Department of Computer Science, Western University, London, Ontario, N6A 5B7, Canada
| | - Eliseos J Mucaki
- Department of Biochemistry, Western University, London, Ontario, N6A 5C1, Canada
| | - Peter K Rogan
- Department of Computer Science, Western University, London, Ontario, N6A 5B7, Canada.,Department of Biochemistry, Western University, London, Ontario, N6A 5C1, Canada.,Department of Oncology, Western University, London, Ontario, N6A 4L6, Canada.,Cytognomix Inc., London, Ontario, N5X 3X5, Canada
| |
Collapse
|
11
|
Suske G. NF-Y and SP transcription factors — New insights in a long-standing liaison. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2017; 1860:590-597. [DOI: 10.1016/j.bbagrm.2016.08.011] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/21/2016] [Revised: 08/18/2016] [Accepted: 08/24/2016] [Indexed: 12/31/2022]
|
12
|
Doane AS, Elemento O. Regulatory elements in molecular networks. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2017; 9. [PMID: 28093886 DOI: 10.1002/wsbm.1374] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2016] [Revised: 11/06/2016] [Accepted: 11/17/2016] [Indexed: 12/20/2022]
Abstract
Regulatory elements determine the connectivity of molecular networks and mediate a variety of regulatory processes ranging from DNA looping to transcriptional, posttranscriptional, and posttranslational regulation. This review highlights our current understanding of the different types of regulatory elements found in molecular networks with a focus on DNA regulatory elements. We highlight technical advances and current challenges for the mapping of regulatory elements at the genome-wide scale, and describe new computational methods to uncover these elements via reconstructing regulatory networks from large genomic datasets. WIREs Syst Biol Med 2017, 9:e1374. doi: 10.1002/wsbm.1374 For further resources related to this article, please visit the WIREs website.
Collapse
Affiliation(s)
- Ashley S Doane
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| | - Olivier Elemento
- HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Cornell Medicine, New York, NY, USA
| |
Collapse
|
13
|
Guo Y, Gifford DK. Modular combinatorial binding among human trans-acting factors reveals direct and indirect factor binding. BMC Genomics 2017; 18:45. [PMID: 28061806 PMCID: PMC5219757 DOI: 10.1186/s12864-016-3434-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2016] [Accepted: 12/19/2016] [Indexed: 11/25/2022] Open
Abstract
Background The combinatorial binding of trans-acting factors (TFs) to the DNA is critical to the spatial and temporal specificity of gene regulation. For certain regulatory regions, more than one regulatory module (set of TFs that bind together) are combined to achieve context-specific gene regulation. However, previous approaches are limited to either pairwise TF co-association analysis or assuming that only one module is used in each regulatory region. Results We present a new computational approach that models the modular organization of TF combinatorial binding. Our method learns compact and coherent regulatory modules from in vivo binding data using a topic model. We found that the binding of 115 TFs in K562 cells can be organized into 49 interpretable modules. Furthermore, we found that tens of thousands of regulatory regions use multiple modules, a structure that cannot be observed with previous hard clustering based methods. The modules discovered recapitulate many published protein-protein physical interactions, have consistent functional annotations of chromatin states, and uncover context specific co-binding such as gene proximal binding of NFY + FOS + SP and distal binding of NFY + FOS + USF. For certain TFs, the co-binding partners of direct binding (motif present) differs from those of indirect binding (motif absent); the distinct set of co-binding partners can predict whether the TF binds directly or indirectly with up to 95% accuracy. Joint analysis across two cell types reveals both cell-type-specific and shared regulatory modules. Conclusions Our results provide comprehensive cell-type-specific combinatorial binding maps and suggest a modular organization of combinatorial binding. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3434-3) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yuchun Guo
- MIT, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, 02139, USA
| | - David K Gifford
- MIT, Computer Science and Artificial Intelligence Laboratory, Cambridge, MA, 02139, USA.
| |
Collapse
|
14
|
Wong KC, Peng C, Yan S, Liang C. Probabilistic Inference on Multiple Normalized Genome-Wide Signal Profiles With Model Regularization. IEEE Trans Nanobioscience 2016; 16:43-50. [PMID: 27893398 DOI: 10.1109/tnb.2016.2631406] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Understanding genome-wide protein-DNA interaction signals forms the basis for further focused studies in gene regulation. In particular, the chromatin immunoprecipitation with massively parallel DNA sequencing technology (ChIP-Seq) can enable us to measure the in vivo genome-wide occupancy of the DNA-binding protein of interest in a single run. Multiple ChIP-Seq runs thus inherent the potential for us to decipher the combinatorial occupancies of multiple DNA-binding proteins. To handle the genome-wide signal profiles from those multiple runs, we propose to integrate regularized regression functions (i.e., LASSO, Elastic Net, and Ridge Regression) into the well-established SignalRanker and FullSignalRanker frameworks, resulting in six additional probabilistic models for inference on multiple normalized genome-wide signal profiles. The corresponding model training algorithms are devised with computational complexity analysis. Comprehensive benchmarking is conducted to demonstrate and compare the performance of nine related probabilistic models on the ENCODE ChIP-Seq datasets. The results indicate that the regularized SignalRanker models, in contrast to the original SignalRanker models, can demonstrate excellent inference performance comparable to the FullSignalRanker models with low model complexities and time complexities. Such a feature is especially valuable in the context of the rapidly growing genome-wide signal profile data in the recent years.
Collapse
|
15
|
Abstract
Chromatin immunoprecipitation followed by sequencing is an invaluable assay for identifying the genomic binding sites of transcription factors. However, transcription factors rarely bind chromatin alone but often bind together with other cofactors, forming protein complexes. Here, we describe a computational method that integrates multiple ChIP-seq and RNA-seq datasets to discover protein complexes and determine their role as activators or repressors. This chapter outlines a detailed computational pipeline for discovering and predicting binding partners from ChIP-seq data and inferring their role in regulating gene expression. This work aims at developing hypotheses about gene regulation via binding partners and deciphering the combinatorial nature of DNA-binding proteins.
Collapse
|
16
|
Hu L, Xu Z, Hu B, Lu ZJ. COME: a robust coding potential calculation tool for lncRNA identification and characterization based on multiple features. Nucleic Acids Res 2016; 45:e2. [PMID: 27608726 PMCID: PMC5224497 DOI: 10.1093/nar/gkw798] [Citation(s) in RCA: 82] [Impact Index Per Article: 10.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/22/2015] [Revised: 08/25/2016] [Accepted: 08/31/2016] [Indexed: 12/31/2022] Open
Abstract
Recent genomic studies suggest that novel long non-coding RNAs (lncRNAs) are specifically expressed and far outnumber annotated lncRNA sequences. To identify and characterize novel lncRNAs in RNA sequencing data from new samples, we have developed COME, a coding potential calculation tool based on multiple features. It integrates multiple sequence-derived and experiment-based features using a decompose-compose method, which makes it more accurate and robust than other well-known tools. We also showed that COME was able to substantially improve the consistency of predication results from other coding potential calculators. Moreover, COME annotates and characterizes each predicted lncRNA transcript with multiple lines of supporting evidence, which are not provided by other tools. Remarkably, we found that one subgroup of lncRNAs classified by such supporting features (i.e. conserved local RNA secondary structure) was highly enriched in a well-validated database (lncRNAdb). We further found that the conserved structural domains on lncRNAs had better chance than other RNA regions to interact with RNA binding proteins, based on the recent eCLIP-seq data in human, indicating their potential regulatory roles. Overall, we present COME as an accurate, robust and multiple-feature supported method for the identification and characterization of novel lncRNAs. The software implementation is available at https://github.com/lulab/COME.
Collapse
Affiliation(s)
- Long Hu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China.,PKU-Tsinghua-NIBS Graduate Program, School of Life Sciences, Peking University, Beijing 100871, China
| | - Zhiyu Xu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Boqin Hu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, Tsinghua-Peking Joint Center for Life Sciences, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
17
|
Chen X, Jung JG, Shajahan-Haq AN, Clarke R, Shih IM, Wang Y, Magnani L, Wang TL, Xuan J. ChIP-BIT: Bayesian inference of target genes using a novel joint probabilistic model of ChIP-seq profiles. Nucleic Acids Res 2016; 44:e65. [PMID: 26704972 PMCID: PMC4838354 DOI: 10.1093/nar/gkv1491] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2015] [Revised: 11/16/2015] [Accepted: 12/09/2015] [Indexed: 11/16/2022] Open
Abstract
Chromatin immunoprecipitation with massively parallel DNA sequencing (ChIP-seq) has greatly improved the reliability with which transcription factor binding sites (TFBSs) can be identified from genome-wide profiling studies. Many computational tools are developed to detect binding events or peaks, however the robust detection of weak binding events remains a challenge for current peak calling tools. We have developed a novel Bayesian approach (ChIP-BIT) to reliably detect TFBSs and their target genes by jointly modeling binding signal intensities and binding locations of TFBSs. Specifically, a Gaussian mixture model is used to capture both binding and background signals in sample data. As a unique feature of ChIP-BIT, background signals are modeled by a local Gaussian distribution that is accurately estimated from the input data. Extensive simulation studies showed a significantly improved performance of ChIP-BIT in target gene prediction, particularly for detecting weak binding signals at gene promoter regions. We applied ChIP-BIT to find target genes from NOTCH3 and PBX1 ChIP-seq data acquired from MCF-7 breast cancer cells. TF knockdown experiments have initially validated about 30% of co-regulated target genes identified by ChIP-BIT as being differentially expressed in MCF-7 cells. Functional analysis on these genes further revealed the existence of crosstalk between Notch and Wnt signaling pathways.
Collapse
Affiliation(s)
- Xi Chen
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA 22203, USA
| | - Jin-Gyoung Jung
- Department of Pathology, Johns Hopkins Medical Institutions, 1550 Orleans Street, CRB-II, Baltimore, MD 21231, USA
| | - Ayesha N Shajahan-Haq
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3970 Reservoir Road NW, Washington, DC 20057, USA
| | - Robert Clarke
- Department of Oncology, Lombardi Comprehensive Cancer Center, Georgetown University Medical Center, 3970 Reservoir Road NW, Washington, DC 20057, USA
| | - Ie-Ming Shih
- Department of Pathology, Johns Hopkins Medical Institutions, 1550 Orleans Street, CRB-II, Baltimore, MD 21231, USA
| | - Yue Wang
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA 22203, USA
| | - Luca Magnani
- Department of Surgery and Cancer, Imperial College London, ICTEM building, Hammersmith Hospital, DuCane Road, London W120NN, UK
| | - Tian-Li Wang
- Department of Pathology, Johns Hopkins Medical Institutions, 1550 Orleans Street, CRB-II, Baltimore, MD 21231, USA
| | - Jianhua Xuan
- Bradley Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, 900 North Glebe Road, Arlington, VA 22203, USA
| |
Collapse
|
18
|
Dolfini D, Zambelli F, Pedrazzoli M, Mantovani R, Pavesi G. A high definition look at the NF-Y regulome reveals genome-wide associations with selected transcription factors. Nucleic Acids Res 2016; 44:4684-702. [PMID: 26896797 PMCID: PMC4889920 DOI: 10.1093/nar/gkw096] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2015] [Accepted: 02/09/2016] [Indexed: 12/11/2022] Open
Abstract
NF-Y is a trimeric transcription factor (TF), binding the CCAAT box element, for which several results suggest a pioneering role in activation of transcription. In this work, we integrated 380 ENCODE ChIP-Seq experiments for 154 TFs and cofactors with sequence analysis, protein–protein interactions and RNA profiling data, in order to identify genome-wide regulatory modules resulting from the co-association of NF-Y with other TFs. We identified three main degrees of co-association with NF-Y for sequence-specific TFs. In the most relevant one, we found TFs having a significant overlap with NF-Y in their DNA binding loci, some with a precise spacing of binding sites with respect to the CCAAT box, others (FOS, Sp1/2, RFX5, IRF3, PBX3) mostly lacking their canonical binding site and bound to arrays of well spaced CCAAT boxes. As expected, NF-Y binding also correlates with RNA Pol II General TFs and with subunits of complexes involved in the control of H3K4 methylations. Co-association patterns are confirmed by protein–protein interactions, and correspond to specific functional categorizations and expression level changes of target genes following NF-Y inactivation. These data define genome-wide rules for the organization of NF-Y-centered regulatory modules, supporting a model of distinct categorization and synergy with well defined sets of TFs.
Collapse
Affiliation(s)
- Diletta Dolfini
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| | - Federico Zambelli
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy Istituto di Biomembrane e Bioenergetica, Consiglio Nazionale delle Ricerche, Bari, Via Amendola 165/A, 70126, Italy
| | - Maurizio Pedrazzoli
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| | - Roberto Mantovani
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| | - Giulio Pavesi
- Dipartimento di Bioscienze, Università degli Studi di Milano, Milano, Via Celoria 26, 20133, Italy
| |
Collapse
|
19
|
Liu L, Zhao W, Zhou X. Modeling co-occupancy of transcription factors using chromatin features. Nucleic Acids Res 2015; 44:e49. [PMID: 26590261 PMCID: PMC4797273 DOI: 10.1093/nar/gkv1281] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2015] [Accepted: 11/04/2015] [Indexed: 12/11/2022] Open
Abstract
Regulation of gene expression requires both transcription factor (TFs) and epigenetic modifications, and interplays between the two types of factors have been discovered. However study of relationships between chromatin features and TF–TF co-occupancy remains limited. Here, we revealed the relationship by first illustrating distinct profile patterns of chromatin features related to different binding events, including single TF binding and TF–TF co-occupancy of 71 TFs from five human cell lines. We further implemented statistical analyses to demonstrate the relationship by accurately predicting co-occupancy genome-widely using chromatin features including DNase I hypersensitivity, 11 histone modifications (HMs) and GC content. Remarkably, our results showed that the combination of chromatin features enables accurate predictions across the five cells. For individual chromatin features, DNase I enables high and consistent predictions. H3K27ac, H3K4me 2, H3K4me3 and H3K9ac are more reliable predictors than other HMs. Although the combination of 11 HMs achieves accurate predictions, their predictive ability varies considerably when a model obtained from one cell is applied to others, indicating relationship between HMs and TF–TF co-occupancy is cell type dependent. GC content is not a reliable predictor, but the addition of GC content to any other features enhances their predictive ability. Together, our results elucidate a strong relationship between TF–TF co-occupancy and chromatin features.
Collapse
Affiliation(s)
- Liang Liu
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Weiling Zhao
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| | - Xiaobo Zhou
- Center for Bioinformatics and Systems Biology, Department of Radiology, Wake Forest School of Medicine, Winston-Salem, NC 27157, USA
| |
Collapse
|
20
|
Wong KC, Peng C, Li Y. Probabilistic Inference on Multiple Normalized Signal Profiles from Next Generation Sequencing: Transcription Factor Binding Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015; 12:1416-1428. [PMID: 26671811 DOI: 10.1109/tcbb.2015.2424421] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
With the prevalence of chromatin immunoprecipitation (ChIP) with sequencing (ChIP-Seq) technology, massive ChIP-Seq data has been accumulated. The ChIP-Seq technology measures the genome-wide occupancy of DNA-binding proteins in vivo. It is well-known that different DNA-binding protein occupancies may result in a gene being regulated in different conditions (e.g. different cell types). To fully understand a gene's function, it is essential to develop probabilistic models on multiple ChIP-Seq profiles for deciphering the gene transcription causalities. In this work, we propose and describe two probabilistic models. Assuming the conditional independence of different DNA-binding proteins' occupancies, the first method (SignalRanker) is developed as an intuitive method for ChIP-Seq genome-wide signal profile inference. Unfortunately, such an assumption may not always hold in some gene regulation cases. Thus, we propose and describe another method (FullSignalRanker) which does not make the conditional independence assumption. The proposed methods are compared with other existing methods on ENCODE ChIP-Seq datasets, demonstrating its regression and classification ability. The results suggest that FullSignalRanker is the best-performing method for recovering the signal ranks on the promoter and enhancer regions. In addition, FullSignalRanker is also the best-performing method for peak sequence classification. We envision that SignalRanker and FullSignalRanker will become important in the era of next generation sequencing. FullSignalRanker program is available on the following website: http://www.cs.toronto.edu/~wkc/FullSignalRanker/.
Collapse
|
21
|
Get to Understand More from Single-Cells: Current Studies of Microfluidic-Based Techniques for Single-Cell Analysis. Int J Mol Sci 2015. [PMID: 26213918 PMCID: PMC4581168 DOI: 10.3390/ijms160816763] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/14/2023] Open
Abstract
This review describes the microfluidic techniques developed for the analysis of a single cell. The characteristics of microfluidic (e.g., little sample amount required, high-throughput performance) make this tool suitable to answer and to solve biological questions of interest about a single cell. This review aims to introduce microfluidic related techniques for the isolation, trapping and manipulation of a single cell. The major approaches for detection in single-cell analysis are introduced; the applications of single-cell analysis are then summarized. The review concludes with discussions of the future directions and opportunities of microfluidic systems applied in analysis of a single cell.
Collapse
|
22
|
Hu L, Di C, Kai M, Yang YCT, Li Y, Qiu Y, Hu X, Yip KY, Zhang MQ, Lu ZJ. A common set of distinct features that characterize noncoding RNAs across multiple species. Nucleic Acids Res 2014; 43:104-14. [PMID: 25505163 PMCID: PMC4288202 DOI: 10.1093/nar/gku1316] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open
Abstract
To find signature features shared by various ncRNA sub-types and characterize novel ncRNAs, we have developed a method, RNAfeature, to investigate >600 sets of genomic and epigenomic data with various evolutionary and biophysical scores. RNAfeature utilizes a fine-tuned intra-species wrapper algorithm that is followed by a novel feature selection strategy across species. It considers long distance effect of certain features (e.g. histone modification at the promoter region). We finally narrow down on 10 informative features (including sequences, structures, expression profiles and epigenetic signals). These features are complementary to each other and as a whole can accurately distinguish canonical ncRNAs from CDSs and UTRs (accuracies: >92% in human, mouse, worm and fly). Moreover, the feature pattern is conserved across multiple species. For instance, the supervised 10-feature model derived from animal species can predict ncRNAs in Arabidopsis (accuracy: 82%). Subsequently, we integrate the 10 features to define a set of noncoding potential scores, which can identify, evaluate and characterize novel noncoding RNAs. The score covers all transcribed regions (including unconserved ncRNAs), without requiring assembly of the full-length transcripts. Importantly, the noncoding potential allows us to identify and characterize potential functional domains with feature patterns similar to canonical ncRNAs (e.g. tRNA, snRNA, miRNA, etc) on ∼70% of human long ncRNAs (lncRNAs).
Collapse
Affiliation(s)
- Long Hu
- PKU-Tsinghua-NIBS Graduate Program, School of Life Sciences, Peking University, Beijing 100871, China MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Chao Di
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Mingxuan Kai
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yu-Cheng T Yang
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yang Li
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Yunjiang Qiu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| | - Xihao Hu
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Kevin Y Yip
- Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong
| | - Michael Q Zhang
- Department of Molecular and Cell Biology, Center for Systems Biology, The University of Texas, Dallas 800 West Campbell Road, RL11 Richardson, TX 75080-3021, USA MOE Key Laboratory of Bioinformatics and Bioinformatics Division, Center for Synthetic and Systems Biology, TNLIST and School of Medicine, Tsinghua University, Beijing 100084, China
| | - Zhi John Lu
- MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology and Center for Plant Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China
| |
Collapse
|
23
|
Hyun BR, McElwee JL, Soloway PD. Single molecule and single cell epigenomics. Methods 2014; 72:41-50. [PMID: 25204781 DOI: 10.1016/j.ymeth.2014.08.015] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2014] [Revised: 08/19/2014] [Accepted: 08/27/2014] [Indexed: 01/24/2023] Open
Abstract
Dynamically regulated changes in chromatin states are vital for normal development and can produce disease when they go awry. Accordingly, much effort has been devoted to characterizing these states under normal and pathological conditions. Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is the most widely used method to characterize where in the genome transcription factors, modified histones, modified nucleotides and chromatin binding proteins are found; bisulfite sequencing (BS-seq) and its variants are commonly used to characterize the locations of DNA modifications. Though very powerful, these methods are not without limitations. Notably, they are best at characterizing one chromatin feature at a time, yet chromatin features arise and function in combination. Investigators commonly superimpose separate ChIP-seq or BS-seq datasets, and then infer where chromatin features are found together. While these inferences might be correct, they can be misleading when the chromatin source has distinct cell types, or when a given cell type exhibits any cell to cell variation in chromatin state. These ambiguities can be eliminated by robust methods that directly characterize the existence and genomic locations of combinations of chromatin features in very small inputs of cells or ideally, single cells. Here we review single molecule epigenomic methods under development to overcome these limitations, the technical challenges associated with single molecule methods and their potential application to single cells.
Collapse
Affiliation(s)
- Byung-Ryool Hyun
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - John L McElwee
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA
| | - Paul D Soloway
- Division of Nutritional Sciences, Cornell University, Ithaca, NY 14853, USA.
| |
Collapse
|
24
|
Wong KC, Li Y, Peng C, Zhang Z. SignalSpider: probabilistic pattern discovery on multiple normalized ChIP-Seq signal profiles. ACTA ACUST UNITED AC 2014; 31:17-24. [PMID: 25192742 DOI: 10.1093/bioinformatics/btu604] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]
Abstract
MOTIVATION Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-Seq) measures the genome-wide occupancy of transcription factors in vivo. Different combinations of DNA-binding protein occupancies may result in a gene being expressed in different tissues or at different developmental stages. To fully understand the functions of genes, it is essential to develop probabilistic models on multiple ChIP-Seq profiles to decipher the combinatorial regulatory mechanisms by multiple transcription factors. RESULTS In this work, we describe a probabilistic model (SignalSpider) to decipher the combinatorial binding events of multiple transcription factors. Comparing with similar existing methods, we found SignalSpider performs better in clustering promoter and enhancer regions. Notably, SignalSpider can learn higher-order combinatorial patterns from multiple ChIP-Seq profiles. We have applied SignalSpider on the normalized ChIP-Seq profiles from the ENCODE consortium and learned model instances. We observed different higher-order enrichment and depletion patterns across sets of proteins. Those clustering patterns are supported by Gene Ontology (GO) enrichment, evolutionary conservation and chromatin interaction enrichment, offering biological insights for further focused studies. We also proposed a specific enrichment map visualization method to reveal the genome-wide transcription factor combinatorial patterns from the models built, which extend our existing fine-scale knowledge on gene regulation to a genome-wide level. AVAILABILITY AND IMPLEMENTATION The matrix-algebra-optimized executables and source codes are available at the authors' websites: http://www.cs.toronto.edu/∼wkc/SignalSpider.
Collapse
Affiliation(s)
- Ka-Chun Wong
- Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Yue Li
- Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Chengbin Peng
- Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| | - Zhaolei Zhang
- Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada Department of Computer Science and Terrence Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, Toronto, Ontario, Canada, CEMSE Division, King Abdullah University of Science and Technology, Thuwal, Jeddah, K.S.A., Banting and Best Department of Medical Research and Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
| |
Collapse
|
25
|
Jiang P, Singh M. CCAT: Combinatorial Code Analysis Tool for transcriptional regulation. Nucleic Acids Res 2013; 42:2833-47. [PMID: 24366875 PMCID: PMC3950699 DOI: 10.1093/nar/gkt1302] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023] Open
Abstract
Combinatorial interplay among transcription factors (TFs) is an important mechanism by which transcriptional regulatory specificity is achieved. However, despite the increasing number of TFs for which either binding specificities or genome-wide occupancy data are known, knowledge about cooperativity between TFs remains limited. To address this, we developed a computational framework for predicting genome-wide co-binding between TFs (CCAT, Combinatorial Code Analysis Tool), and applied it to Drosophila melanogaster to uncover cooperativity among TFs during embryo development. Using publicly available TF binding specificity data and DNaseI chromatin accessibility data, we first predicted genome-wide binding sites for 324 TFs across five stages of D. melanogaster embryo development. We then applied CCAT in each of these developmental stages, and identified from 19 to 58 pairs of TFs in each stage whose predicted binding sites are significantly co-localized. We found that nearby binding sites for pairs of TFs predicted to cooperate were enriched in regions bound in relevant ChIP experiments, and were more evolutionarily conserved than other pairs. Further, we found that TFs tend to be co-localized with other TFs in a dynamic manner across developmental stages. All generated data as well as source code for our front-to-end pipeline are available at http://cat.princeton.edu.
Collapse
Affiliation(s)
- Peng Jiang
- Department of Computer Science, Princeton University, Princeton, 08540 NJ, USA and Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, 08544 NJ, USA
| | | |
Collapse
|