Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Su J, Teichmann SA, Down TA. Assessing computational methods of cis-regulatory module prediction. PLoS Comput Biol 2010;6:e1001020. [PMID: 21152003 PMCID: PMC2996316 DOI: 10.1371/journal.pcbi.1001020] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 10/29/2010] [Indexed: 01/02/2023] Open

For:	Su J, Teichmann SA, Down TA. Assessing computational methods of cis-regulatory module prediction. PLoS Comput Biol 2010;6:e1001020. [PMID: 21152003 PMCID: PMC2996316 DOI: 10.1371/journal.pcbi.1001020] [Citation(s) in RCA: 63] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/25/2010] [Accepted: 10/29/2010] [Indexed: 01/02/2023] Open

Number

Cited by Other Article(s)

Kuffler L, Skelly DA, Czechanski A, Fortin HJ, Munger SC, Baker CL, Reinholdt LG, Carter GW. Imputation of 3D genome structure by genetic-epigenetic interaction modeling in mice. eLife 2024;12:RP88222. [PMID: 38669177 PMCID: PMC11052574 DOI: 10.7554/elife.88222] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 04/28/2024] Open

Abstract

Gene expression is known to be affected by interactions between local genetic variation and DNA accessibility, with the latter organized into three-dimensional chromatin structures. Analyses of these interactions have previously been limited, obscuring their regulatory context, and the extent to which they occur throughout the genome. Here, we undertake a genome-scale analysis of these interactions in a genetically diverse population to systematically identify global genetic-epigenetic interaction, and reveal constraints imposed by chromatin structure. We establish the extent and structure of genotype-by-epigenotype interaction using embryonic stem cells derived from Diversity Outbred mice. This mouse population segregates millions of variants from eight inbred founders, enabling precision genetic mapping with extensive genotypic and phenotypic diversity. With 176 samples profiled for genotype, gene expression, and open chromatin, we used regression modeling to infer genetic-epigenetic interactions on a genome-wide scale. Our results demonstrate that statistical interactions between genetic variants and chromatin accessibility are common throughout the genome. We found that these interactions occur within the local area of the affected gene, and that this locality corresponds to topologically associated domains (TADs). The likelihood of interaction was most strongly defined by the three-dimensional (3D) domain structure rather than linear DNA sequence. We show that stable 3D genome structure is an effective tool to guide searches for regulatory elements and, conversely, that regulatory elements in genetically diverse populations provide a means to infer 3D genome structure. We confirmed this finding with CTCF ChIP-seq that revealed strain-specific binding in the inbred founder mice. In stem cells, open chromatin participating in the most significant regression models demonstrated an enrichment for developmental genes and the TAD-forming CTCF-binding complex, providing an opportunity for statistical inference of shifting TAD boundaries operating during early development. These findings provide evidence that genetic and epigenetic factors operate within the context of 3D chromatin structure.

Collapse

Liu Z, Wong HM, Chen X, Lin J, Zhang S, Yan S, Wang F, Li X, Wong KC. MotifHub: Detection of trans-acting DNA motif group with probabilistic modeling algorithm. Comput Biol Med 2024;168:107753. [PMID: 38039889 DOI: 10.1016/j.compbiomed.2023.107753] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/06/2023] [Revised: 10/30/2023] [Accepted: 11/20/2023] [Indexed: 12/03/2023]

Wu X, Liu S, Liang G. Detecting clusters of transcription factors based on a nonhomogeneous poisson process model. BMC Bioinformatics 2022;23:535. [PMID: 36494794 PMCID: PMC9738027 DOI: 10.1186/s12859-022-05090-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2022] [Accepted: 11/30/2022] [Indexed: 12/13/2022] Open

REDfly: An Integrated Knowledgebase for Insect Regulatory Genomics. INSECTS 2022;13:insects13070618. [PMID: 35886794 PMCID: PMC9323752 DOI: 10.3390/insects13070618] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 06/16/2022] [Revised: 07/01/2022] [Accepted: 07/06/2022] [Indexed: 11/29/2022]

Abstract

Simple Summary

Understanding how genes are regulated is a vital area of current biological research and a crucial adjunct to ongoing efforts to sequence entire genomes. Knowing the DNA sequences responsible for gene regulation—transcriptional cis-regulatory modules (CRMs, e.g., “enhancers”) and transcription factor binding sites (TFBSs)—is important for many areas of research including interpretation and validation of data developed by large-scale genomics projects, providing training data for machine-learning CRM-discovery methods, genome annotation, modeling gene-regulatory networks, studying the evolution of gene regulation, and numerous aspects of the basic biology of transcriptional regulation. Knowledge of insect CRMs is also an important step in developing biotechnology methods for control of insect disease vectors and for eliminating pathogen transmission. The REDfly (Regulatory Element Database for Fly) database integrates all of the available insect cis-regulatory information from multiple sources to provide a comprehensive collection of known regulatory elements. In this paper, we describe REDfly’s basic contents and data model, emphasizing recently added features, and provide illustrated walk-throughs of some common search scenarios.

Abstract

We provide here an updated description of the REDfly (Regulatory Element Database for Fly) database of transcriptional regulatory elements, a unique resource that provides regulatory annotation for the genome of Drosophila and other insects. The genomic sequences regulating insect gene expression—transcriptional cis-regulatory modules (CRMs, e.g., “enhancers”) and transcription factor binding sites (TFBSs)—are not currently curated by any other major database resources. However, knowledge of such sequences is important, as CRMs play critical roles with respect to disease as well as normal development, phenotypic variation, and evolution. Characterized CRMs also provide useful tools for both basic and applied research, including developing methods for insect control. REDfly, which is the most detailed existing platform for metazoan regulatory-element annotation, includes over 40,000 experimentally verified CRMs and TFBSs along with their DNA sequences, their associated genes, and the expression patterns they direct. Here, we briefly describe REDfly’s contents and data model, with an emphasis on the new features implemented since 2020. We then provide an illustrated walk-through of several common REDfly search use cases.

Collapse

Warren TL, Lambert JT, Nord AS. AAV Deployment of Enhancer-Based Expression Constructs In Vivo in Mouse Brain. J Vis Exp 2022:10.3791/62650. [PMID: 35435902 PMCID: PMC10010840 DOI: 10.3791/62650] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/31/2022] Open

Yang TH, Yang YC, Tu KC. regCNN: identifying Drosophila genome-wide cis-regulatory modules via integrating the local patterns in epigenetic marks and transcription factor binding motifs. Comput Struct Biotechnol J 2022;20:296-308. [PMID: 35035784 PMCID: PMC8724954 DOI: 10.1016/j.csbj.2021.12.015] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2021] [Revised: 12/10/2021] [Accepted: 12/10/2021] [Indexed: 11/20/2022] Open

Abstract

Transcription regulation in metazoa is controlled by the binding events of transcription factors (TFs) or regulatory proteins on specific modular DNA regulatory sequences called cis-regulatory modules (CRMs). Understanding the distributions of CRMs on a genomic scale is essential for constructing the metazoan transcriptional regulatory networks that help diagnose genetic disorders. While traditional reporter-assay CRM identification approaches can provide an in-depth understanding of functions of some CRM, these methods are usually cost-inefficient and low-throughput. It is generally believed that by integrating diverse genomic data, reliable CRM predictions can be made. Hence, researchers often first resort to computational algorithms for genome-wide CRM screening before specific experiments. However, current existing in silico methods for searching potential CRMs were restricted by low sensitivity, poor prediction accuracy, or high computation time from TFBS composition combinatorial complexity. To overcome these obstacles, we designed a novel CRM identification pipeline called regCNN by considering the base-by-base local patterns in TF binding motifs and epigenetic profiles. On the test set, regCNN shows an accuracy/auROC of 84.5%/92.5% in CRM identification. And by further considering local patterns in epigenetic profiles and TF binding motifs, it can accomplish 4.7% (92.5%–87.8%) improvement in the auROC value over the average value-based pure multi-layer perceptron model. We also demonstrated that regCNN outperforms all currently available tools by at least 11.3% in auROC values. Finally, regCNN is verified to be robust against its resizing window hyperparameter in dealing with the variable lengths of CRMs. The model of regCNN can be downloaded athttp://cobisHSS0.im.nuk.edu.tw/regCNN/.

Collapse

Asma H, Halfon MS. Annotating the Insect Regulatory Genome. INSECTS 2021;12:591. [PMID: 34209769 PMCID: PMC8305585 DOI: 10.3390/insects12070591] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/28/2021] [Revised: 06/23/2021] [Accepted: 06/25/2021] [Indexed: 11/17/2022]

Talukder A, Barham C, Li X, Hu H. Interpretation of deep learning in genomics and epigenomics. Brief Bioinform 2021;22:bbaa177. [PMID: 34020542 PMCID: PMC8138893 DOI: 10.1093/bib/bbaa177] [Citation(s) in RCA: 42] [Impact Index Per Article: 14.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2020] [Revised: 06/26/2020] [Accepted: 07/10/2020] [Indexed: 12/17/2022] Open

Tobias IC, Abatti LE, Moorthy SD, Mullany S, Taylor T, Khader N, Filice MA, Mitchell JA. Transcriptional enhancers: from prediction to functional assessment on a genome-wide scale. Genome 2020;64:426-448. [PMID: 32961076 DOI: 10.1139/gen-2020-0104] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]

Osmala M, Lähdesmäki H. Enhancer prediction in the human genome by probabilistic modelling of the chromatin feature patterns. BMC Bioinformatics 2020;21:317. [PMID: 32689977 PMCID: PMC7370432 DOI: 10.1186/s12859-020-03621-3] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2019] [Accepted: 06/19/2020] [Indexed: 12/11/2022] Open

Abstract

Background

The binding sites of transcription factors (TFs) and the localisation of histone modifications in the human genome can be quantified by the chromatin immunoprecipitation assay coupled with next-generation sequencing (ChIP-seq). The resulting chromatin feature data has been successfully adopted for genome-wide enhancer identification by several unsupervised and supervised machine learning methods. However, the current methods predict different numbers and different sets of enhancers for the same cell type and do not utilise the pattern of the ChIP-seq coverage profiles efficiently.

Results

In this work, we propose a PRobabilistic Enhancer PRedictIoN Tool (PREPRINT) that assumes characteristic coverage patterns of chromatin features at enhancers and employs a statistical model to account for their variability. PREPRINT defines probabilistic distance measures to quantify the similarity of the genomic query regions and the characteristic coverage patterns. The probabilistic scores of the enhancer and non-enhancer samples are utilised to train a kernel-based classifier. The performance of the method is demonstrated on ENCODE data for two cell lines. The predicted enhancers are computationally validated based on the transcriptional regulatory protein binding sites and compared to the predictions obtained by state-of-the-art methods.

Conclusion

PREPRINT performs favorably to the state-of-the-art methods, especially when requiring the methods to predict a larger set of enhancers. PREPRINT generalises successfully to data from cell type not utilised for training, and often the PREPRINT performs better than the previous methods. The PREPRINT enhancers are less sensitive to the choice of prediction threshold. PREPRINT identifies biologically validated enhancers not predicted by the competing methods. The enhancers predicted by PREPRINT can aid the genome interpretation in functional genomics and clinical studies.

Collapse

Chen X, Gu J, Neuwald AF, Hilakivi-Clarke L, Clarke R, Xuan J. BICORN: An R package for integrative inference of de novo cis-regulatory modules. Sci Rep 2020;10:7960. [PMID: 32409786 PMCID: PMC7224214 DOI: 10.1038/s41598-020-63043-2] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/26/2019] [Accepted: 01/15/2020] [Indexed: 12/18/2022] Open

Wang X, Zhou T, Wunderlich Z, Maurano MT, DePace AH, Nuzhdin SV, Rohs R. Analysis of Genetic Variation Indicates DNA Shape Involvement in Purifying Selection. Mol Biol Evol 2019;35:1958-1967. [PMID: 29850830 PMCID: PMC6063282 DOI: 10.1093/molbev/msy099] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open

Asma H, Halfon MS. Computational enhancer prediction: evaluation and improvements. BMC Bioinformatics 2019;20:174. [PMID: 30953451 PMCID: PMC6451241 DOI: 10.1186/s12859-019-2781-x] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2019] [Accepted: 03/27/2019] [Indexed: 12/21/2022] Open

Abstract

BACKGROUND

Identifying transcriptional enhancers and other cis-regulatory modules (CRMs) is an important goal of post-sequencing genome annotation. Computational approaches provide a useful complement to empirical methods for CRM discovery, but it is critical that we develop effective means to evaluate their performance in terms of estimating their sensitivity and specificity.

RESULTS

We introduce here pCRMeval, a pipeline for in silico evaluation of any enhancer prediction tools that are flexible enough to be applied to the Drosophila melanogaster genome. pCRMeval compares the result of predictions with the extensive existing knowledge of experimentally-validated Drosophila CRMs in order to estimate the precision and relative sensitivity of the prediction method. In the case of supervised prediction methods-when training data composed of validated CRMs are used-pCRMeval can also assess the sensitivity of specific training sets. We demonstrate the utility of pCRMeval through evaluation of our SCRMshaw CRM prediction method and training data. By measuring the impact of different parameters on SCRMshaw performance, as assessed by pCRMeval, we develop a more robust version of SCRMshaw, SCRMshaw_HD, that improves the number of predictions while maintaining sensitivity and specificity. Our analysis also demonstrates that SCRMshaw_HD, when applied to increasingly less well-assembled genomes, maintains its strong predictive power with only a minor drop-off in performance.

CONCLUSION

Our pCRMeval pipeline provides a general framework for evaluation that can be applied to any CRM prediction method, particularly a supervised method. While we make use of it here primarily to test and improve a particular method for CRM prediction, SCRMshaw, pCRMeval should provide a valuable platform to the research community not only for evaluating individual methods, but also for comparing between competing methods.

Collapse

Ho EYK, Cao Q, Gu M, Chan RWL, Wu Q, Gerstein M, Yip KY. Shaping the nebulous enhancer in the era of high-throughput assays and genome editing. Brief Bioinform 2019;21:836-850. [PMID: 30895290 DOI: 10.1093/bib/bbz030] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2018] [Revised: 02/15/2019] [Accepted: 02/26/2019] [Indexed: 01/22/2023] Open

A New Algorithm for Identifying Cis-Regulatory Modules Based on Hidden Markov Model. BIOMED RESEARCH INTERNATIONAL 2018;2017:6274513. [PMID: 28497059 PMCID: PMC5405574 DOI: 10.1155/2017/6274513] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 10/23/2016] [Revised: 03/06/2017] [Accepted: 03/23/2017] [Indexed: 11/24/2022]

Banf M, Rhee SY. Computational inference of gene regulatory networks: Approaches, limitations and opportunities. BIOCHIMICA ET BIOPHYSICA ACTA-GENE REGULATORY MECHANISMS 2016;1860:41-52. [PMID: 27641093 DOI: 10.1016/j.bbagrm.2016.09.003] [Citation(s) in RCA: 60] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 04/13/2016] [Revised: 09/08/2016] [Accepted: 09/08/2016] [Indexed: 10/21/2022]

Guo H, Huo H, Yu Q. SMCis: An Effective Algorithm for Discovery of Cis-Regulatory Modules. PLoS One 2016;11:e0162968. [PMID: 27637070 PMCID: PMC5026350 DOI: 10.1371/journal.pone.0162968] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/17/2016] [Accepted: 08/31/2016] [Indexed: 12/02/2022] Open

Lewis J, van der Burg K, Mazo-Vargas A, Reed R. ChIP-Seq-Annotated Heliconius erato Genome Highlights Patterns of cis -Regulatory Evolution in Lepidoptera. Cell Rep 2016;16:2855-2863. [DOI: 10.1016/j.celrep.2016.08.042] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/19/2016] [Revised: 07/14/2016] [Accepted: 08/12/2016] [Indexed: 12/11/2022] Open

Westermark PO. Linking Core Promoter Classes to Circadian Transcription. PLoS Genet 2016;12:e1006231. [PMID: 27504829 PMCID: PMC4978467 DOI: 10.1371/journal.pgen.1006231] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Accepted: 07/08/2016] [Indexed: 01/09/2023] Open

Chatzou M, Magis C, Chang JM, Kemena C, Bussotti G, Erb I, Notredame C. Multiple sequence alignment modeling: methods and applications. Brief Bioinform 2015;17:1009-1023. [PMID: 26615024 DOI: 10.1093/bib/bbv099] [Citation(s) in RCA: 84] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/10/2015] [Revised: 10/16/2015] [Indexed: 12/20/2022] Open

Leoncini M, Montangero M, Pellegrini M, Tillan KP. CMStalker: A Combinatorial Tool for Composite Motif Discovery. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:1123-1136. [PMID: 26451824 DOI: 10.1109/tcbb.2014.2359444] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Phylogenomic identification of regulatory sequences in bacteria: an analysis of statistical power and an application to Borrelia burgdorferi sensu lato. mBio 2015;6:mBio.00011-15. [PMID: 25873371 PMCID: PMC4453575 DOI: 10.1128/mbio.00011-15] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/03/2023] Open

Abstract

UNLABELLED

Phylogenomic footprinting is an approach for ab initio identification of genome-wide regulatory elements in bacterial species based on sequence conservation. The statistical power of the phylogenomic approach depends on the degree of sequence conservation, the length of regulatory elements, and the level of phylogenetic divergence among genomes. Building on an earlier model, we propose a binomial model that uses synonymous tree lengths as neutral expectations for determining the statistical significance of conserved intergenic spacer (IGS) sequences. Simulations show that the binomial model is robust to variations in the value of evolutionary parameters, including base frequencies and the transition-to-transversion ratio. We used the model to search for regulatory sequences in the Lyme disease species group (Borrelia burgdorferi sensu lato) using 23 genomes. The model indicates that the currently available set of Borrelia genomes would not yield regulatory sequences shorter than five bases, suggesting that genome sequences of additional B. burgdorferi sensu lato species are needed. Nevertheless, we show that previously known regulatory elements are indeed strongly conserved in sequence or structure across these Borrelia species. Further, we predict with sufficient confidence two new RpoS binding sites, 39 promoters, 19 transcription terminators, 28 noncoding RNAs, and four sets of coregulated genes. These putative cis- and trans-regulatory elements suggest novel, Borrelia-specific mechanisms regulating the transition between the tick and host environments, a key adaptation and virulence mechanism of B. burgdorferi. Alignments of IGS sequences are available on BorreliaBase.org, an online database of orthologous open reading frame (ORF) and IGS sequences in Borrelia.

IMPORTANCE

While bacterial genomes contain mostly protein-coding genes, they also house DNA sequences regulating the expression of these genes. Gene regulatory sequences tend to be conserved during evolution. By sequencing and comparing related genomes, one can therefore identify regulatory sequences in bacteria based on sequence conservation. Here, we describe a statistical framework by which one may determine how many genomes need to be sequenced and at what level of evolutionary relatedness in order to achieve a high level of statistical significance. We applied the framework to Borrelia burgdorferi, the Lyme disease agent, and identified a large number of candidate regulatory sequences, many of which are known to be involved in regulating the phase transition between the tick vector and mammalian hosts.

Collapse

Blatti C, Kazemian M, Wolfe S, Brodsky M, Sinha S. Integrating motif, DNA accessibility and gene expression data to build regulatory maps in an organism. Nucleic Acids Res 2015;43:3998-4012. [PMID: 25791631 PMCID: PMC4417154 DOI: 10.1093/nar/gkv195] [Citation(s) in RCA: 33] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 02/24/2015] [Indexed: 11/17/2022] Open

Suryamohan K, Halfon MS. Identifying transcriptional cis-regulatory modules in animal genomes. WILEY INTERDISCIPLINARY REVIEWS. DEVELOPMENTAL BIOLOGY 2015;4:59-84. [PMID: 25704908 PMCID: PMC4339228 DOI: 10.1002/wdev.168] [Citation(s) in RCA: 47] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/24/2014] [Revised: 11/04/2014] [Accepted: 11/16/2014] [Indexed: 11/08/2022]

Abstract

UNLABELLED

Gene expression is regulated through the activity of transcription factors (TFs) and chromatin-modifying proteins acting on specific DNA sequences, referred to as cis-regulatory elements. These include promoters, located at the transcription initiation sites of genes, and a variety of distal cis-regulatory modules (CRMs), the most common of which are transcriptional enhancers. Because regulated gene expression is fundamental to cell differentiation and acquisition of new cell fates, identifying, characterizing, and understanding the mechanisms of action of CRMs is critical for understanding development. CRM discovery has historically been challenging, as CRMs can be located far from the genes they regulate, have few readily identifiable sequence characteristics, and for many years were not amenable to high-throughput discovery methods. However, the recent availability of complete genome sequences and the development of next-generation sequencing methods have led to an explosion of both computational and empirical methods for CRM discovery in model and nonmodel organisms alike. Experimentally, CRMs can be identified through chromatin immunoprecipitation directed against TFs or histone post-translational modifications, identification of nucleosome-depleted 'open' chromatin regions, or sequencing-based high-throughput functional screening. Computational methods include comparative genomics, clustering of known or predicted TF-binding sites, and supervised machine-learning approaches trained on known CRMs. All of these methods have proven effective for CRM discovery, but each has its own considerations and limitations, and each is subject to a greater or lesser number of false-positive identifications. Experimental confirmation of predictions is essential, although shortcomings in current methods suggest that additional means of validation need to be developed. For further resources related to this article, please visit the WIREs website.

CONFLICT OF INTEREST

The authors have declared no conflicts of interest for this article.

Collapse

Zheng Y, Li X, Hu H. Comprehensive discovery of DNA motifs in 349 human cells and tissues reveals new features of motifs. Nucleic Acids Res 2015;43:74-83. [PMID: 25505144 PMCID: PMC4288161 DOI: 10.1093/nar/gku1261] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2014] [Revised: 11/13/2014] [Accepted: 11/17/2014] [Indexed: 01/15/2023] Open

Yang TH, Wang CC, Hung PC, Wu WS. cisMEP: an integrated repository of genomic epigenetic profiles and cis-regulatory modules in Drosophila. BMC SYSTEMS BIOLOGY 2014;8 Suppl 4:S8. [PMID: 25521507 PMCID: PMC4290730 DOI: 10.1186/1752-0509-8-s4-s8] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/16/2022]

Abstract

BACKGROUND

Cis-regulatory modules (CRMs), or the DNA sequences required for regulating gene expression, play the central role in biological researches on transcriptional regulation in metazoan species. Nowadays, the systematic understanding of CRMs still mainly resorts to computational methods due to the time-consuming and small-scale nature of experimental methods. But the accuracy and reliability of different CRM prediction tools are still unclear. Without comparative cross-analysis of the results and combinatorial consideration with extra experimental information, there is no easy way to assess the confidence of the predicted CRMs. This limits the genome-wide understanding of CRMs.

DESCRIPTION

It is known that transcription factor binding and epigenetic profiles tend to determine functions of CRMs in gene transcriptional regulation. Thus integration of the genome-wide epigenetic profiles with systematically predicted CRMs can greatly help researchers evaluate and decipher the prediction confidence and possible transcriptional regulatory functions of these potential CRMs. However, these data are still fragmentary in the literatures. Here we performed the computational genome-wide screening for potential CRMs using different prediction tools and constructed the pioneer database, cisMEP (cis-regulatory module epigenetic profile database), to integrate these computationally identified CRMs with genomic epigenetic profile data. cisMEP collects the literature-curated TFBS location data and nine genres of epigenetic data for assessing the confidence of these potential CRMs and deciphering the possible CRM functionality.

CONCLUSIONS

cisMEP aims to provide a user-friendly interface for researchers to assess the confidence of different potential CRMs and to understand the functions of CRMs through experimentally-identified epigenetic profiles. The deposited potential CRMs and experimental epigenetic profiles for confidence assessment provide experimentally testable hypotheses for the molecular mechanisms of metazoan gene regulation. We believe that the information deposited in cisMEP will greatly facilitate the comparative usage of different CRM prediction tools and will help biologists to study the modular regulatory mechanisms between different TFs and their target genes.

Collapse

iRegulon: from a gene list to a gene regulatory network using large motif and track collections. PLoS Comput Biol 2014;10:e1003731. [PMID: 25058159 PMCID: PMC4109854 DOI: 10.1371/journal.pcbi.1003731] [Citation(s) in RCA: 608] [Impact Index Per Article: 60.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 05/27/2014] [Indexed: 01/17/2023] Open

Abstract

Identifying master regulators of biological processes and mapping their downstream gene networks are key challenges in systems biology. We developed a computational method, called iRegulon, to reverse-engineer the transcriptional regulatory network underlying a co-expressed gene set using cis-regulatory sequence analysis. iRegulon implements a genome-wide ranking-and-recovery approach to detect enriched transcription factor motifs and their optimal sets of direct targets. We increase the accuracy of network inference by using very large motif collections of up to ten thousand position weight matrices collected from various species, and linking these to candidate human TFs via a motif2TF procedure. We validate iRegulon on gene sets derived from ENCODE ChIP-seq data with increasing levels of noise, and we compare iRegulon with existing motif discovery methods. Next, we use iRegulon on more challenging types of gene lists, including microRNA target sets, protein-protein interaction networks, and genetic perturbation data. In particular, we over-activate p53 in breast cancer cells, followed by RNA-seq and ChIP-seq, and could identify an extensive up-regulated network controlled directly by p53. Similarly we map a repressive network with no indication of direct p53 regulation but rather an indirect effect via E2F and NFY. Finally, we generalize our computational framework to include regulatory tracks such as ChIP-seq data and show how motif and track discovery can be combined to map functional regulatory interactions among co-expressed genes. iRegulon is available as a Cytoscape plugin from http://iregulon.aertslab.org.

Gene regulatory networks control developmental, homeostatic, and disease processes by governing precise levels and spatio-temporal patterns of gene expression. Determining their topology can provide mechanistic insight into these processes. Gene regulatory networks consist of interactions between transcription factors and their direct target genes. Each regulatory interaction represents the binding of the transcription factor to a specific DNA binding site near its target gene. Here we present a computational method, called iRegulon, to identify master regulators and direct target genes in a human gene signature, i.e. a set of co-expressed genes. iRegulon relies on the analysis of the regulatory sequences around each gene in the gene set to detect enriched TF motifs or ChIP-seq peaks, using databases of nearly 10.000 TF motifs and 1000 ChIP-seq data sets or “tracks”. Next, it associates enriched motifs and tracks with candidate transcription factors and determines the optimal subset of direct target genes. We validate iRegulon on ENCODE data, and use it in combination with RNA-seq and ChIP-seq data to map a p53 downstream network with new predicted co-factors and targets. iRegulon is available as a Cytoscape plugin, supporting human, mouse, and Drosophila genes, and provides access to hundreds of cancer-related TF-target subnetworks or “regulons”.

Collapse

Di L, Pagan PE, Packer D, Martin CL, Akther S, Ramrattan G, Mongodin EF, Fraser CM, Schutzer SE, Luft BJ, Casjens SR, Qiu WG. BorreliaBase: a phylogeny-centered browser of Borrelia genomes. BMC Bioinformatics 2014;15:233. [PMID: 24994456 PMCID: PMC4094996 DOI: 10.1186/1471-2105-15-233] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2014] [Accepted: 06/26/2014] [Indexed: 11/29/2022] Open

Rouault H, Santolini M, Schweisguth F, Hakim V. Imogene: identification of motifs and cis-regulatory modules underlying gene co-regulation. Nucleic Acids Res 2014;42:6128-45. [PMID: 24682824 PMCID: PMC4041412 DOI: 10.1093/nar/gku209] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Gan Y, Guan J, Zhou S, Zhang W. Identifying Cis-Regulatory Elements and Modules Using Conditional Random Fields. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2014;11:73-82. [PMID: 26355509 DOI: 10.1109/tcbb.2013.131] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Diez D, Hutchins AP, Miranda-Saavedra D. Systematic identification of transcriptional regulatory modules from protein-protein interaction networks. Nucleic Acids Res 2013;42:e6. [PMID: 24137002 PMCID: PMC3874207 DOI: 10.1093/nar/gkt913] [Citation(s) in RCA: 26] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Yip KY, Cheng C, Gerstein M. Machine learning and genome annotation: a match meant to be? Genome Biol 2013;14:205. [PMID: 23731483 PMCID: PMC4053789 DOI: 10.1186/gb-2013-14-5-205] [Citation(s) in RCA: 49] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022] Open

Dickel DE, Visel A, Pennacchio LA. Functional anatomy of distant-acting mammalian enhancers. Philos Trans R Soc Lond B Biol Sci 2013;368:20120359. [PMID: 23650633 DOI: 10.1098/rstb.2012.0359] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open

Kheradpour P, Ernst J, Melnikov A, Rogov P, Wang L, Zhang X, Alston J, Mikkelsen TS, Kellis M. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res 2013;23:800-11. [PMID: 23512712 PMCID: PMC3638136 DOI: 10.1101/gr.144899.112] [Citation(s) in RCA: 234] [Impact Index Per Article: 21.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2012] [Accepted: 03/14/2013] [Indexed: 01/06/2023]

Abstract

Genome-wide chromatin annotations have permitted the mapping of putative regulatory elements across multiple human cell types. However, their experimental dissection by directed regulatory motif disruption has remained unfeasible at the genome scale. Here, we use a massively parallel reporter assay (MPRA) to measure the transcriptional levels induced by 145-bp DNA segments centered on evolutionarily conserved regulatory motif instances within enhancer chromatin states. We select five predicted activators (HNF1, HNF4, FOXA, GATA, NFE2L2) and two predicted repressors (GFI1, ZFP161) and measure reporter expression in erythroleukemia (K562) and liver carcinoma (HepG2) cell lines. We test 2104 wild-type sequences and 3314 engineered enhancer variants containing targeted motif disruptions, each using 10 barcode tags and two replicates. The resulting data strongly confirm the enhancer activity and cell-type specificity of enhancer chromatin states, the ability of 145-bp segments to recapitulate both, the necessary role of regulatory motifs in enhancer function, and the complementary roles of activator and repressor motifs. We find statistically robust evidence that (1) disrupting the predicted activator motifs abolishes enhancer function, while silent or motif-improving changes maintain enhancer activity; (2) evolutionary conservation, nucleosome exclusion, binding of other factors, and strength of the motif match are predictive of enhancer activity; (3) scrambling repressor motifs leads to aberrant reporter expression in cell lines where the enhancers are usually inactive. Our results suggest a general strategy for deciphering cis-regulatory elements by systematic large-scale manipulation and provide quantitative enhancer activity measurements across thousands of constructs that can be mined to develop predictive models of gene expression.

Collapse

Simonatto M, Barozzi I, Natoli G. Non-coding transcription at cis-regulatory elements: computational and experimental approaches. Methods 2013;63:66-75. [PMID: 23542771 DOI: 10.1016/j.ymeth.2013.03.021] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2013] [Revised: 03/18/2013] [Accepted: 03/20/2013] [Indexed: 12/17/2022] Open

Shu JJ, Li Y. A statistical thin-tail test of predicting regulatory regions in the Drosophila genome. Theor Biol Med Model 2013;10:11. [PMID: 23409927 PMCID: PMC3598831 DOI: 10.1186/1742-4682-10-11] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2012] [Accepted: 02/07/2013] [Indexed: 11/10/2022] Open

Classification of human genomic regions based on experimentally determined binding sites of more than 100 transcription-related factors. Genome Biol 2012;13:R48. [PMID: 22950945 PMCID: PMC3491392 DOI: 10.1186/gb-2012-13-9-r48] [Citation(s) in RCA: 187] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2011] [Revised: 05/06/2012] [Accepted: 06/08/2012] [Indexed: 01/22/2023] Open

Shu JJ, Li Y. A statistical fat-tail test of predicting regulatory regions in the Drosophila genome. Comput Biol Med 2012;42:935-41. [PMID: 22884312 DOI: 10.1016/j.compbiomed.2012.07.007] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2010] [Revised: 05/29/2012] [Accepted: 07/18/2012] [Indexed: 11/19/2022]

Deyneko IV, Weiss S, Leschner S. An integrative computational approach to effectively guide experimental identification of regulatory elements in promoters. BMC Bioinformatics 2012;13:202. [PMID: 22897887 PMCID: PMC3465240 DOI: 10.1186/1471-2105-13-202] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2012] [Accepted: 08/01/2012] [Indexed: 01/22/2023] Open

Abstract

Background

Transcriptional activity of genes depends on many factors like DNA motifs, conformational characteristics of DNA, melting etc. and there are computational approaches for their identification. However, in real applications, the number of predicted, for example, DNA motifs may be considerably large. In cases when various computational programs are applied, systematic experimental knock out of each of the potential elements obviously becomes nonproductive. Hence, one needs an approach that is able to integrate many heterogeneous computational methods and upon that suggest selected regulatory elements for experimental verification.

Results

Here, we present an integrative bioinformatic approach aimed at the discovery of regulatory modules that can be effectively verified experimentally. It is based on combinatorial analysis of known and novel binding motifs, as well as of any other known features of promoters. The goal of this method is the identification of a collection of modules that are specific for an established dataset and at the same time are optimal for experimental verification. The method is particularly effective on small datasets, where most statistical approaches fail. We apply it to promoters that drive tumor-specific gene expression in tumor-colonizing Gram-negative bacteria. The method successfully identified a number of potential modules, which required only a few experiments to be verified. The resulting minimal functional bacterial promoter exhibited high specificity of expression in cancerous tissue.

Conclusions

Experimental analysis of promoter structures guided by bioinformatics has proved to be efficient. The developed computational method is able to include heterogeneous features of promoters and suggest combinatorial modules for experimental testing. Expansibility and robustness of the methodology implemented in the approach ensures good results for a wide range of problems.

Collapse

Genomic approaches towards finding cis-regulatory modules in animals. Nat Rev Genet 2012;13:469-83. [PMID: 22705667 DOI: 10.1038/nrg3242] [Citation(s) in RCA: 156] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Maston GA, Landt SG, Snyder M, Green MR. Characterization of enhancer function from genome-wide analyses. Annu Rev Genomics Hum Genet 2012;13:29-57. [PMID: 22703170 DOI: 10.1146/annurev-genom-090711-163723] [Citation(s) in RCA: 80] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]

Aboukhalil A, Bulyk ML. LOESS correction for length variation in gene set-based genomic sequence analysis. ACTA ACUST UNITED AC 2012;28:1446-54. [PMID: 22492312 DOI: 10.1093/bioinformatics/bts155] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023]

Sun H, Guns T, Fierro AC, Thorrez L, Nijssen S, Marchal K. Unveiling combinatorial regulation through the combination of ChIP information and in silico cis-regulatory module detection. Nucleic Acids Res 2012;40:e90. [PMID: 22422841 PMCID: PMC3384348 DOI: 10.1093/nar/gks237] [Citation(s) in RCA: 22] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/16/2023] Open

Beaster-Jones L. Cis-regulation and conserved non-coding elements in amphioxus. Brief Funct Genomics 2012;11:118-30. [DOI: 10.1093/bfgp/els006] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Nikulova AA, Polishchuk MS, Tumanian VG, Makeev VY, Mironov AA, Favorov AV. Correlations between clusters of protein-DNA binding sites and the binding experimental data allow predicting a structure of regulatory modules. Biophysics (Nagoya-shi) 2012. [DOI: 10.1134/s0006350912020157] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open

Erb I, González-Vallinas JR, Bussotti G, Blanco E, Eyras E, Notredame C. Use of ChIP-Seq data for the design of a multiple promoter-alignment method. Nucleic Acids Res 2012;40:e52. [PMID: 22230796 PMCID: PMC3326335 DOI: 10.1093/nar/gkr1292] [Citation(s) in RCA: 19] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/22/2023] Open

Aerts S. Computational strategies for the genome-wide identification of cis-regulatory elements and transcriptional targets. Curr Top Dev Biol 2012;98:121-45. [PMID: 22305161 DOI: 10.1016/b978-0-12-386499-4.00005-7] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]

Taher L, Narlikar L, Ovcharenko I. CLARE: Cracking the LAnguage of Regulatory Elements. ACTA ACUST UNITED AC 2011;28:581-3. [PMID: 22199387 DOI: 10.1093/bioinformatics/btr704] [Citation(s) in RCA: 12] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]

Halfon MS, Zhu Q, Brennan ER, Zhou Y. Erroneous attribution of relevant transcription factor binding sites despite successful prediction of cis-regulatory modules. BMC Genomics 2011;12:578. [PMID: 22115527 PMCID: PMC3235160 DOI: 10.1186/1471-2164-12-578] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/01/2011] [Accepted: 11/25/2011] [Indexed: 12/22/2022] Open

Abstract

Background

Cis-regulatory modules are bound by transcription factors to regulate gene expression. Characterizing these DNA sequences is central to understanding gene regulatory networks and gaining insight into mechanisms of transcriptional regulation, but genome-scale regulatory module discovery remains a challenge. One popular approach is to scan the genome for clusters of transcription factor binding sites, especially those conserved in related species. When such approaches are successful, it is typically assumed that the activity of the modules is mediated by the identified binding sites and their cognate transcription factors. However, the validity of this assumption is often not assessed.

Results

We successfully predicted five new cis-regulatory modules by combining binding site identification with sequence conservation and compared these to unsuccessful predictions from a related approach not utilizing sequence conservation. Despite greatly improved predictive success, the positive set had similar degrees of sequence and binding site conservation as the negative set. We explored the reasons for this by mutagenizing putative binding sites in three cis-regulatory modules. A large proportion of the tested sites had little or no demonstrable role in mediating regulatory element activity. Examination of loss-of-function mutants also showed that some transcription factors supposedly binding to the modules are not required for their function.

Conclusions

Our results raise important questions about interpreting regulatory module predictions obtained by finding clusters of conserved binding sites. Attribution of function to these sites and their cognate transcription factors may be incorrect even when modules are successfully identified. Our study underscores the importance of empirical validation of computational results even when these results are in line with expectation.

Collapse

Ludwig MZ, Manu, Kittler R, White KP, Kreitman M. Consequences of eukaryotic enhancer architecture for gene expression dynamics, development, and fitness. PLoS Genet 2011;7:e1002364. [PMID: 22102826 PMCID: PMC3213169 DOI: 10.1371/journal.pgen.1002364] [Citation(s) in RCA: 53] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/11/2011] [Accepted: 09/14/2011] [Indexed: 12/13/2022] Open

Abstract

The regulatory logic of time- and tissue-specific gene expression has mostly been dissected in the context of the smallest DNA fragments that, when isolated, recapitulate native expression in reporter assays. It is not known if the genomic sequences surrounding such fragments, often evolutionarily conserved, have any biological function or not. Using an enhancer of the even-skipped gene of Drosophila as a model, we investigate the functional significance of the genomic sequences surrounding empirically identified enhancers. A 480 bp long "minimal stripe element" is able to drive even-skipped expression in the second of seven stripes but is embedded in a larger region of 800 bp containing evolutionarily conserved binding sites for required transcription factors. To assess the overall fitness contribution made by these binding sites in the native genomic context, we employed a gene-replacement strategy in which whole-locus transgenes, capable of rescuing even-skipped(-) lethality to adulthood, were substituted for the native gene. The molecular phenotypes were characterized by tagging Even-skipped with a fluorescent protein and monitoring gene expression dynamics in living embryos. We used recombineering to excise the sequences surrounding the minimal enhancer and site-specific transgenesis to create co-isogenic strains differing only in their stripe 2 sequences. Remarkably, the flanking sequences were dispensable for viability, proving the sufficiency of the minimal element for biological function under normal conditions. These sequences are required for robustness to genetic and environmental perturbation instead. The mutant enhancers had measurable sex- and dose-dependent effects on viability. At the molecular level, the mutants showed a destabilization of stripe placement and improper activation of downstream genes. Finally, we demonstrate through live measurements that the peripheral sequences are required for temperature compensation. These results imply that seemingly redundant regulatory sequences beyond the minimal enhancer are necessary for robust gene expression and that "robustness" itself must be an evolved characteristic of the wild-type enhancer.

Collapse