1
|
RBPSpot: Learning on appropriate contextual information for RBP binding sites discovery. iScience 2021; 24:103381. [PMID: 34841226 PMCID: PMC8605353 DOI: 10.1016/j.isci.2021.103381] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/07/2021] [Revised: 09/01/2021] [Accepted: 10/27/2021] [Indexed: 11/29/2022] Open
Abstract
Identifying the factors determining the RBP-RNA interactions remains a big challenge. It involves sparse binding motifs and a suitable sequence context for binding. The present work describes an approach to detect RBP binding sites in RNAs using an ultra-fast inexact k-mers search for statistically significant seeds. The seeds work as an anchor to evaluate the context and binding potential using flanking region information while leveraging from Deep Feed-forward Neural Network. The developed models also received support from MD-simulation studies. The implemented software, RBPSpot, scored consistently high for all the performance metrics including average accuracy of ∼90% across a large number of validated datasets. It outperformed the compared tools, including some with much complex deep-learning models, during a comprehensive benchmarking process. RBPSpot can identify RBP binding sites in the human system and can also be used to develop new models, making it a valuable resource in the area of regulatory system studies. Efficient motif anchoring helps to get good quality contextual information on binding Realistic and high granularity datasets ensure better performance of the classifiers DNN models on the contextual features outperform more complex deep learning tools RBPSpot algorithm may be used to develop RBP binding models for other species also
Collapse
|
2
|
van Bömmel A, Love MI, Chung HR, Vingron M. coTRaCTE predicts co-occurring transcription factors within cell-type specific enhancers. PLoS Comput Biol 2018; 14:e1006372. [PMID: 30142147 PMCID: PMC6126874 DOI: 10.1371/journal.pcbi.1006372] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/12/2018] [Revised: 09/06/2018] [Accepted: 07/17/2018] [Indexed: 02/06/2023] Open
Abstract
Cell-type specific gene expression is regulated by the combinatorial action of transcription factors (TFs). In this study, we predict transcription factor (TF) combinations that cooperatively bind in a cell-type specific manner. We first divide DNase hypersensitive sites into cell-type specifically open vs. ubiquitously open sites in 64 cell types to describe possible cell-type specific enhancers. Based on the pattern contrast between these two groups of sequences we develop "co-occurring TF predictor on Cell-Type specific Enhancers" (coTRaCTE) - a novel statistical method to determine regulatory TF co-occurrences. Contrasting the co-binding of TF pairs between cell-type specific and ubiquitously open chromatin guarantees the high cell-type specificity of the predictions. coTRaCTE predicts more than 2000 co-occurring TF pairs in 64 cell types. The large majority (70%) of these TF pairs is highly cell-type specific and overlaps in TF pair co-occurrence are highly consistent among related cell types. Furthermore, independently validated co-occurring and directly interacting TFs are significantly enriched in our predictions. Focusing on the regulatory network derived from the predicted co-occurring TF pairs in embryonic stem cells (ESCs) we find that it consists of three subnetworks with distinct functions: maintenance of pluripotency governed by OCT4, SOX2 and NANOG, regulation of early development governed by KLF4, STAT3, ZIC3 and ZNF148 and general functions governed by MYC, TCF3 and YY1. In summary, coTRaCTE predicts highly cell-type specific co-occurring TFs which reveal new insights into transcriptional regulatory mechanisms.
Collapse
Affiliation(s)
- Alena van Bömmel
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
| | - Michael I. Love
- Department of Biostatistics, Department of Genetics, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, United States of America
| | - Ho-Ryun Chung
- Otto Warburg Laboratory, Max Planck Institute for Molecular Genetics, Berlin, Germany
- Philipps-Universität Marburg, Fachbereich Medizin, Institut für Medizinische Bioinformatik und Biostatistik, Marburg, Germany
| | - Martin Vingron
- Department of Computational Molecular Biology, Max Planck Institute for Molecular Genetics, Berlin, Germany
- * E-mail:
| |
Collapse
|
3
|
Chong A, Teo JX, Ban KHK. Distinct epigenetic signatures elucidate enhancer-gene relationships that delineate CIMP and non-CIMP colorectal cancers. Oncotarget 2018; 7:28027-39. [PMID: 27049830 PMCID: PMC5053707 DOI: 10.18632/oncotarget.8473] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2015] [Accepted: 03/14/2016] [Indexed: 12/22/2022] Open
Abstract
Epigenetic changes, like DNA methylation, affect gene expression and in colorectal cancer (CRC), a distinct phenotype called the CpG island methylator phenotype (“CIMP”) has significantly higher levels of DNA methylation at so-called “Type C loci” within the genome. We postulate that enhancer-gene pairs are coordinately controlled through DNA methylation in order to regulate the expression of key genes/biomarkers for a particular phenotype. Firstly, we found 24 experimentally-validated enhancers (VISTA enhancer browser) that contained statistically significant (FDR-adjusted q-value of <0.01) differentially methylated regions (DMRs) (1000bp) in a study of CIMP versus non-CIMP CRCs. Of these, the methylation of 2 enhancers, 1702 and 1944, were found to be very well correlated with the methylation of the genes Wnt3A and IGDCC3, respectively, in two separate and independent datasets. We show for the first time that there are indeed distinct and dynamic changes in the methylation pattern of specific enhancer-gene pairs in CRCs. Such a coordinated epigenetic event could be indicative of an interaction between (1) enhancer 1702 and Wnt3A and (2) enhancer 1944 and IGDCC3. Moreover, our study shows that the methylation patterns of these 2 enhancer-gene pairs can potentially be used as biomarkers to delineate CIMP from non-CIMP CRCs.
Collapse
Affiliation(s)
- Allen Chong
- Department of Pathology, National University of Singapore, 119074 Singapore.,Present address: Shanxi Guoxin Caregeno Medical Laboratories, Taiyuan, Shanxi Province, 030006 China
| | - Jing Xian Teo
- Cancer Science Institute, National University of Singapore, 117599 Singapore
| | - Kenneth H K Ban
- Department of Biochemistry, National University of Singapore, 117596 Singapore.,Institute of Molecular and Cell Biology, 138673 Singapore
| |
Collapse
|
4
|
Cofunctional Subpathways Were Regulated by Transcription Factor with Common Motif, Common Family, or Common Tissue. BIOMED RESEARCH INTERNATIONAL 2015; 2015:780357. [PMID: 26688819 PMCID: PMC4672121 DOI: 10.1155/2015/780357] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 07/26/2015] [Revised: 11/02/2015] [Accepted: 11/04/2015] [Indexed: 11/17/2022]
Abstract
Dissecting the characteristics of the transcription factor (TF) regulatory subpathway is helpful for understanding the TF underlying regulatory function in complex biological systems. To gain insight into the influence of TFs on their regulatory subpathways, we constructed a global TF-subpathways network (TSN) to analyze systematically the regulatory effect of common-motif, common-family, or common-tissue TFs on subpathways. We performed cluster analysis to show that the common-motif, common-family, or common-tissue TFs that regulated the same pathway classes tended to cluster together and contribute to the same biological function that led to disease initiation and progression. We analyzed the Jaccard coefficient to show that the functional consistency of subpathways regulated by the TF pairs with common motif, common family, or common tissue was significantly greater than the random TF pairs at the subpathway level, pathway level, and pathway class level. For example, HNF4A (hepatocyte nuclear factor 4, alpha) and NR1I3 (nuclear receptor subfamily 1, group I, member 3) were a pair of TFs with common motif, common family, and common tissue. They were involved in drug metabolism pathways and were liver-specific factors required for physiological transcription. In short, we inferred that the cofunctional subpathways were regulated by common-motif, common-family, or common-tissue TFs.
Collapse
|
5
|
Deb A, Kundu S. Deciphering Cis-Regulatory Element Mediated Combinatorial Regulation in Rice under Blast Infected Condition. PLoS One 2015; 10:e0137295. [PMID: 26327607 PMCID: PMC4556519 DOI: 10.1371/journal.pone.0137295] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2015] [Accepted: 08/14/2015] [Indexed: 01/15/2023] Open
Abstract
Combinations of cis-regulatory elements (CREs) present at the promoters facilitate the binding of several transcription factors (TFs), thereby altering the consequent gene expressions. Due to the eminent complexity of the regulatory mechanism, the combinatorics of CRE-mediated transcriptional regulation has been elusive. In this work, we have developed a new methodology that quantifies the co-occurrence tendencies of CREs present in a set of promoter sequences; these co-occurrence scores are filtered in three consecutive steps to test their statistical significance; and the significantly co-occurring CRE pairs are presented as networks. These networks of co-occurring CREs are further transformed to derive higher order of regulatory combinatorics. We have further applied this methodology on the differentially up-regulated gene-sets of rice tissues under fungal (Magnaporthe) infected conditions to demonstrate how it helps to understand the CRE-mediated combinatorial gene regulation. Our analysis includes a wide spectrum of biologically important results. The CRE pairs having a strong tendency to co-occur often exhibit very similar joint distribution patterns at the promoters of rice. We couple the network approach with experimental results of plant gene regulation and defense mechanisms and find evidences of auto and cross regulation among TF families, cross-talk among multiple hormone signaling pathways, similarities and dissimilarities in regulatory combinatorics between different tissues, etc. Our analyses have pointed a highly distributed nature of the combinatorial gene regulation facilitating an efficient alteration in response to fungal attack. All together, our proposed methodology could be an important approach in understanding the combinatorial gene regulation. It can be further applied to unravel the tissue and/or condition specific combinatorial gene regulation in other eukaryotic systems with the availability of annotated genomic sequences and suitable experimental data.
Collapse
Affiliation(s)
- Arindam Deb
- Department of Biophysics Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India
| | - Sudip Kundu
- Department of Biophysics Molecular Biology and Bioinformatics, University of Calcutta, Kolkata, West Bengal, India
- Center of Excellence in Systems Biology and Biomedical Engineering (TEQIP Phase II), University of Calcutta, Kolkata, West Bengal, India
- * E-mail:
| |
Collapse
|
6
|
Jankowski A, Prabhakar S, Tiuryn J. TACO: a general-purpose tool for predicting cell-type-specific transcription factor dimers. BMC Genomics 2014; 15:208. [PMID: 24640962 PMCID: PMC4004051 DOI: 10.1186/1471-2164-15-208] [Citation(s) in RCA: 24] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/10/2013] [Accepted: 03/07/2014] [Indexed: 12/22/2022] Open
Abstract
Background Cooperative binding of transcription factor (TF) dimers to DNA is increasingly recognized as a major contributor to binding specificity. However, it is likely that the set of known TF dimers is highly incomplete, given that they were discovered using ad hoc approaches, or through computational analyses of limited datasets. Results Here, we present TACO (Transcription factor Association from Complex Overrepresentation), a general-purpose standalone software tool that takes as input any genome-wide set of regulatory elements and predicts cell-type–specific TF dimers based on enrichment of motif complexes. TACO is the first tool that can accommodate motif complexes composed of overlapping motifs, a characteristic feature of many known TF dimers. Our method comprehensively outperforms existing tools when benchmarked on a reference set of 29 known dimers. We demonstrate the utility and consistency of TACO by applying it to 152 DNase-seq datasets and 94 ChIP-seq datasets. Conclusions Based on these results, we uncover a general principle governing the structure of TF-TF-DNA ternary complexes, namely that the flexibility of the complex is correlated with, and most likely a consequence of, inter-motif spacing.
Collapse
Affiliation(s)
| | - Shyam Prabhakar
- Computational and Systems Biology, Genome Institute of Singapore, 60 Biopolis Street, Singapore 138672, Singapore.
| | | |
Collapse
|
7
|
Kakei Y, Ogo Y, Itai RN, Kobayashi T, Yamakawa T, Nakanishi H, Nishizawa NK. Development of a novel prediction method of cis-elements to hypothesize collaborative functions of cis-element pairs in iron-deficient rice. RICE (NEW YORK, N.Y.) 2013; 6:22. [PMID: 24279975 PMCID: PMC4883709 DOI: 10.1186/1939-8433-6-22] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 05/09/2013] [Accepted: 09/13/2013] [Indexed: 05/20/2023]
Abstract
BACKGROUND Cis-acting elements are essential genomic sequences that control gene expression. In higher eukaryotes, a series of cis-elements function cooperatively. However, further studies are required to examine the co-regulation of multiple cis-elements on a promoter. The aim of this study was to propose a model of cis-element networks that cooperatively regulate gene expression in rice under iron (Fe) deficiency. RESULTS We developed a novel clustering-free method, microarray-associated motif analyzer (MAMA), to predict novel cis-acting elements based on weighted sequence similarities and gene expression profiles in microarray analyses. Simulation of gene expression was performed using a support vector machine and based on the presence of predicted motifs and motif pairs. The accuracy of simulated gene expression was used to evaluate the quality of prediction and to optimize the parameters used in this method. Based on sequences of Oryza sativa genes upregulated by Fe deficiency, MAMA returned experimentally identified cis-elements responsible for Fe deficiency in O. sativa. When this method was applied to O. sativa subjected to zinc deficiency and Arabidopsis thaliana subjected to salt stress, several novel candidate cis-acting elements that overlap with known cis-acting elements, such as ZDRE, ABRE, and DRE, were identified. After optimization, MAMA accurately simulated more than 87% of gene expression. Predicted motifs strongly co-localized in the upstream regions of regulated genes and sequences around transcription start sites. Furthermore, in many cases, the separation (in bp) between co-localized motifs was conserved, suggesting that predicted motifs and the separation between them were important in the co-regulation of gene expression. CONCLUSIONS Our results are suggestive of a typical sequence model for Fe deficiency-responsive promoters and some strong candidate cis-elements that function cooperatively with known cis-elements.
Collapse
Affiliation(s)
- Yusuke Kakei
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Plant Biotechnology Division, Yokohama City University, Kihara Institute for Biological Research Maiokacho, 641-12, Totsuka, Yokohama, Kanagawa 244-0813 Japan
| | - Yuko Ogo
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Functional Transgenic Crops Research Unit, Genetically Modified Organism Research Center National Institute of Agrobiological Sciences, Kannondai 2-1-2, 305-8602 Tsukuba, Ibaraki Japan
| | - Reiko N Itai
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Takanori Kobayashi
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
| | - Takashi Yamakawa
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Hiromi Nakanishi
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
| | - Naoko K Nishizawa
- />Graduate School of Agricultural and Life Sciences, The University of Tokyo, 1-1-1 Yayoi, 113-8657 Bunkyo-ku Tokyo, Japan
- />Research Institute for Bioresources and Biotechnology, Ishikawa Prefectural University, 1-308 Suematsu, 921-8836 Nonoichi-machi, Ishikawa Japan
| |
Collapse
|
8
|
Ranganathan S, Tongsima S, Chan J, Tan TW, Schönbach C. Advances in translational bioinformatics and population genomics in the Asia-Pacific. BMC Genomics 2013; 13 Suppl 7:S1. [PMID: 23282089 PMCID: PMC3521394 DOI: 10.1186/1471-2164-13-s7-s1] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022] Open
Abstract
The theme of the 2012 International Conference on Bioinformatics (InCoB) in Bangkok, Thailand was "From Biological Data to Knowledge to Technological Breakthroughs." Besides providing a forum for life scientists and bioinformatics researchers in the Asia-Pacific region to meet and interact, the conference also hosted thematic sessions on the Pan-Asian Pacific Genome Initiative and immunoinformatics. Over the seven years of conference papers published in BMC Bioinformatics and four years in BMC Genomics, we note that there is increasing interest in the applications of -omics technologies to the understanding of diseases, as a forerunner to personalized genomic medicine.
Collapse
Affiliation(s)
- Shoba Ranganathan
- Department of Chemistry and Biomolecular Sciences and ARC Centre of Excellence, Macquarie University, Sydney, NSW 2109, Australia
| | | | | | | | | |
Collapse
|