1
|
Deyneko IV. BestCRM: An Exhaustive Search for Optimal Cis-Regulatory Modules in Promoters Accelerated by the Multidimensional Hash Function. Int J Mol Sci 2024; 25:1903. [PMID: 38339181 PMCID: PMC10856692 DOI: 10.3390/ijms25031903] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/01/2023] [Revised: 01/24/2024] [Accepted: 01/26/2024] [Indexed: 02/12/2024] Open
Abstract
The concept of cis-regulatory modules located in gene promoters represents today's vision of the organization of gene transcriptional regulation. Such modules are a combination of two or more single, short DNA motifs. The bioinformatic identification of such modules belongs to so-called NP-hard problems with extreme computational complexity, and therefore, simplifications, assumptions, and heuristics are usually deployed to tackle the problem. In practice, this requires, first, many parameters to be set before the search, and second, it leads to the identification of locally optimal results. Here, a novel method is presented, aimed at identifying the cis-regulatory elements in gene promoters based on an exhaustive search of all the feasible modules' configurations. All required parameters are automatically estimated using positive and negative datasets. To be computationally efficient, the search is accelerated using a multidimensional hash function, allowing the search to complete in a few hours on a regular laptop (for example, a CPU Intel i7, 3.2 GH, 32 Gb RAM). Tests on an established benchmark and real data show better performance of BestCRM compared to the available methods according to several metrics like specificity, sensitivity, AUC, etc. A great practical advantage of the method is its minimum number of input parameters-apart from positive and negative promoters, only a desired level of module presence in promoters is required.
Collapse
Affiliation(s)
- Igor V Deyneko
- K.A. Timiryazev Institute of Plant Physiology RAS, 35 Botanicheskaya Str., Moscow 127276, Russia
| |
Collapse
|
2
|
Bentsen M, Heger V, Schultheis H, Kuenne C, Looso M. TF-COMB - discovering grammar of transcription factor binding sites. Comput Struct Biotechnol J 2022; 20:4040-4051. [PMID: 35983231 PMCID: PMC9358416 DOI: 10.1016/j.csbj.2022.07.025] [Citation(s) in RCA: 9] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/14/2022] [Accepted: 07/12/2022] [Indexed: 02/07/2023] Open
Abstract
Cooperativity between transcription factors is important to regulate target gene expression. In particular, the binding grammar of TFs in relation to each other, as well as in the context of other genomic elements, is crucial for TF functionality. However, tools to easily uncover co-occurrence between DNA-binding proteins, and investigate the regulatory modules of TFs, are limited. Here we present TF-COMB (Transcription Factor Co-Occurrence using Market Basket analysis) - a tool to investigate co-occurring TFs and binding grammar within regulatory regions. We found that TF-COMB can accurately identify known co-occurring TFs from ChIP-seq data, as well as uncover preferential localization to other genomic elements. With the use of ATAC-seq footprinting and TF motif locations, we found that TFs exhibit both preferred orientation and distance in relation to each other, and that these are biologically significant. Finally, we extended the analysis to not only investigate individual TF pairs, but also TF pairs in the context of networks, which enabled the investigation of TF complexes and TF hubs. In conclusion, TF-COMB is a flexible tool to investigate various aspects of TF binding grammar.
Collapse
Affiliation(s)
- Mette Bentsen
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Vanessa Heger
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Hendrik Schultheis
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Carsten Kuenne
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
| | - Mario Looso
- Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany
- Cardio-Pulmonary Institute (CPI), Bad Nauheim, Germany
- Corresponding author at: Bioinformatics Core Unit (BCU), Max Planck Institute for Heart and Lung Research, Bad Nauheim, Germany.
| |
Collapse
|
3
|
Zhou M, Li H, Wang X, Guan Y. Evidence of widespread, independent sequence signature for transcription factor cobinding. Genome Res 2021; 31:265-278. [PMID: 33303494 PMCID: PMC7849410 DOI: 10.1101/gr.267310.120] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2020] [Accepted: 12/03/2020] [Indexed: 01/03/2023]
Abstract
Transcription factors (TFs) are the vocabulary that genomes use to regulate gene expression and phenotypes. The interactions among TFs enrich this vocabulary and orchestrate diverse biological processes. Although simple models identify open chromatin and the presence of TF motifs as the two major contributors to TF binding patterns, it remains elusive what contributes to the in vivo TF cobinding landscape. In this study, we developed a machine learning algorithm to explore the contributors of the cobinding patterns. The algorithm substantially outperforms the state-of-the-field models for TF cobinding prediction. Game theory-based feature importance analysis reveals that, for most of the TF pairs we studied, independent motif sequences contribute one or more of the two TFs under investigation to their cobinding patterns. Such independent motif sequences include, but are not limited to, transcription initiation-related proteins and known TF complexes. We found the motif sequence signatures and the TFs are rarely mutual, corroborating a hierarchical and directional organization of the regulatory network and refuting the possibility of artifacts caused by shared sequence similarity with the TFs under investigation. We modeled such regulatory language with directed graphs, which reveal shared, global factors that are related to many binding and cobinding patterns.
Collapse
Affiliation(s)
- Manqi Zhou
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Hongyang Li
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Xueqing Wang
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
| |
Collapse
|
4
|
Auer JMT, Stoddart JJ, Christodoulou I, Lima A, Skouloudaki K, Hall HN, Vukojević V, Papadopoulos DK. Of numbers and movement - understanding transcription factor pathogenesis by advanced microscopy. Dis Model Mech 2020; 13:dmm046516. [PMID: 33433399 PMCID: PMC7790199 DOI: 10.1242/dmm.046516] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/09/2023] Open
Abstract
Transcription factors (TFs) are life-sustaining and, therefore, the subject of intensive research. By regulating gene expression, TFs control a plethora of developmental and physiological processes, and their abnormal function commonly leads to various developmental defects and diseases in humans. Normal TF function often depends on gene dosage, which can be altered by copy-number variation or loss-of-function mutations. This explains why TF haploinsufficiency (HI) can lead to disease. Since aberrant TF numbers frequently result in pathogenic abnormalities of gene expression, quantitative analyses of TFs are a priority in the field. In vitro single-molecule methodologies have significantly aided the identification of links between TF gene dosage and transcriptional outcomes. Additionally, advances in quantitative microscopy have contributed mechanistic insights into normal and aberrant TF function. However, to understand TF biology, TF-chromatin interactions must be characterised in vivo, in a tissue-specific manner and in the context of both normal and altered TF numbers. Here, we summarise the advanced microscopy methodologies most frequently used to link TF abundance to function and dissect the molecular mechanisms underlying TF HIs. Increased application of advanced single-molecule and super-resolution microscopy modalities will improve our understanding of how TF HIs drive disease.
Collapse
Affiliation(s)
- Julia M T Auer
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | - Jack J Stoddart
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | | | - Ana Lima
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | | | - Hildegard N Hall
- MRC Human Genetics Unit, University of Edinburgh, Edinburgh EH4 1XU, UK
| | - Vladana Vukojević
- Center for Molecular Medicine (CMM), Department of Clinical Neuroscience, Karolinska Institutet, 17176 Stockholm, Sweden
| | | |
Collapse
|
5
|
Ahmed M, Min DS, Kim DR. Integrating binding and expression data to predict transcription factors combined function. BMC Genomics 2020; 21:610. [PMID: 32894066 PMCID: PMC7487729 DOI: 10.1186/s12864-020-06977-1] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2020] [Accepted: 08/11/2020] [Indexed: 12/29/2022] Open
Abstract
BACKGROUND Transcription factor binding to the regulatory region of a gene induces or represses its gene expression. Transcription factors share their binding sites with other factors, co-factors and/or DNA-binding proteins. These proteins form complexes which bind to the DNA as one-units. The binding of two factors to a shared site does not always lead to a functional interaction. RESULTS We propose a method to predict the combined functions of two factors using comparable binding and expression data (target). We based this method on binding and expression target analysis (BETA), which we re-implemented in R and extended for this purpose. target ranks the factor's targets by importance and predicts the dominant type of interaction between two transcription factors. We applied the method to simulated and real datasets of transcription factor-binding sites and gene expression under perturbation of factors. We found that Yin Yang 1 transcription factor (YY1) and YY2 have antagonistic and independent regulatory targets in HeLa cells, but they may cooperate on a few shared targets. CONCLUSION We developed an R package and a web application to integrate binding (ChIP-seq) and expression (microarrays or RNA-seq) data to determine the cooperative or competitive combined function of two transcription factors.
Collapse
Affiliation(s)
- Mahmoud Ahmed
- Department of Biochemistry and Convergence Medical Sciences and Institute of Health Sciences, Gyeongsang National University School of Medicine, Jinju, 52727, Republic of Korea
| | - Do Sik Min
- College of Pharmacy, Yonsei University, Incheon, 21983, Republic of Korea
| | - Deok Ryong Kim
- Department of Biochemistry and Convergence Medical Sciences and Institute of Health Sciences, Gyeongsang National University School of Medicine, Jinju, 52727, Republic of Korea.
| |
Collapse
|
6
|
Mariani L, Weinand K, Gisselbrecht SS, Bulyk ML. MEDEA: analysis of transcription factor binding motifs in accessible chromatin. Genome Res 2020; 30:736-748. [PMID: 32424069 PMCID: PMC7263192 DOI: 10.1101/gr.260877.120] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2020] [Accepted: 04/10/2020] [Indexed: 12/15/2022]
Abstract
Deciphering the interplay between chromatin accessibility and transcription factor (TF) binding is fundamental to understanding transcriptional regulation, control of cellular states, and the establishment of new phenotypes. Recent genome-wide chromatin accessibility profiling studies have provided catalogs of putative open regions, where TFs can recognize their motifs and regulate gene expression programs. Here, we present motif enrichment in differential elements of accessibility (MEDEA), a computational tool that analyzes high-throughput chromatin accessibility genomic data to identify cell-type-specific accessible regions and lineage-specific motifs associated with TF binding therein. To benchmark MEDEA, we used a panel of reference cell lines profiled by ENCODE and curated by the ENCODE Project Consortium for the ENCODE-DREAM Challenge. By comparing results with RNA-seq data, ChIP-seq peaks, and DNase-seq footprints, we show that MEDEA improves the detection of motifs associated with known lineage specifiers. We then applied MEDEA to 610 ENCODE DNase-seq data sets, where it revealed significant motifs even when absolute enrichment was low and where it identified novel regulators, such as NRF1 in kidney development. Finally, we show that MEDEA performs well on both bulk and single-cell ATAC-seq data. MEDEA is publicly available as part of our Glossary-GENRE suite for motif enrichment analysis.
Collapse
Affiliation(s)
- Luca Mariani
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Kathryn Weinand
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics PhD Program, Harvard University, Cambridge, Massachusetts 02138, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA.,Bioinformatics and Integrative Genomics PhD Program, Harvard University, Cambridge, Massachusetts 02138, USA.,Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, Massachusetts 02115, USA
| |
Collapse
|